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Cancer risk estimating method using sequence polymorphisms in a specific 
region of chromosome 19 

The present invention provides methods and compositions for identifying human 
subjects with an increased risk of having or developing cancer. In particular, this 
Invention relates to the Identification and characterization of polymorphisms in the 
human chromosome 19q, the region r located approximately 19q 13.2-3 correlated 
with increased risk of developing cancer and the responsiveness of a subject to 
various treatments for cancer. 

Background 



DNA polymorphisms provide an efficient way to study the association of genes and 
diseases by analysis of linkage and linkage disequifibrum. With the sequencing of 

15 the human genome a myriad of hitherto unknown genetic polymorphisms among 
people have been detected. Most common among these are the single nucleotide 
polymorphisms, also called SNPs. of which we now know several millions. Other 
examples are variable number of tandem repeat polymorphisms, insertions, dele- 
tions and block modifications. Tandem repeats often have multiple different alleles 

20 (variants), whereas the other groups of polymorphisms usually just have two alleles. 
Soma of these genetic polymorphisms probably play a direct role In the biology of 
the individuals, Including their risk of developing disease, but the virtue of the major- 
ity is that they can serve as markers for the surrounding DNA, and thus serve as 
leads during as search for a causative gene polymorphism, as substitutes In the 

25 evaluation of its role in health and disease* and as substitutes in the evaluation of 
the genetic constitution of individuals. 

The association of an allele of one sequence polymorphism with particular alleles of 
other sequence polymorphisms in the surrounding DNA has two origins, known in 
30 the genetic field as linkage and linkage disequilibrium, respectively. Linkage arises 
because targe parts of chromosomes are passed unchanged from parents to off- 
spring, so that minor regions of a chromosome tend tor flow unchanged from one 
generation to the next and also to be similar in different branches of the same fam- 
• ily. Linkage is gradually eroded by recombination occurring in the cells of the germ- 
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line, but typically operates over multiple generations and distances of a number of , 
million bases in the DNA. 

Linkage disequilibrium deals with whole populations and has Its origin In the (distant) 
5 forefather In whose DNA a new sequence polymorphism arose. The immediate sur- 
roundings in the DNA of the forefather will tend to stay with the new allele for many 
generations. Recombination end changes in the composition of the population will 
again erode the association, but the new allele and the alleles of any other polymor- 
phism nearby will often be partly associated among unrelated humans even today. A 
1 0 crude estimate suggests that alleles of sequence polymorphisms with distances less 
that 10000 bases in the DNA will have tended to stay together since modern man 
^ arose. Linkage disequilbrium in limited populations, for instance Europeans, often 

extends over longer distances. This.can be the result of newer mutations, but can 
also be a consequence of one or more •bottlenecks" with small population sees and 
15 considerable inbreeding in the history of the current population. Two obvious possi- 
bilities for 'bottlenecks* in Europeans are the exodus from Africa and the repopula- 
tion of Europe after the last ice age. 

Linkage disequilibrium is the results of many stochastic events and as such subject 
20 to statistical variation occasionally resulting (n discontinuities, lack of a monotonic 
relationship between association and distance and differences between people of 
different ethnicity. Therefore, it is often advantageous to study more that one se- 
quence polymorphism in a given region. This also allows for further definition of the 
genetic surroundings of the biologically relevant polymorphism by combining the 
25 associated alleles of the different markers into a socalled haplotype. 

^ Humans in general carry two copies of each human chromosome in each cell There 

are exceptions to this rule, not relevant to this application. We therefore speak about 
genotypes i.e. the combined analysis of both chromosomes at a given sequence 
30 polymorphism. The resulting genotypes of a person, analysed for instance on DNA 
from peripheral blood leukocytes, are Inherently very stable over time. Therefore, 
this type of analysis can be performed any time in the life of a person and will be 
applicable to this person for his or her entire life. By the same token such genetic 
analyses are ideally suited to predict future risks of disease. 

35 
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A variety of investigations suggest that many diseases In part are determined by the 
genetic constitution of the Individual. One group of genes in particular has been as- 
sociated with rare genetic predispositions to cancer. These are the genes involved 
in maintaining the Integrity of a persons DNA, the so-called DNA repair genes. One 
5 set of such genes ere the XP genes which participate in nucleotide excision repair, 
and, when mutated, give rise to a 1000 fold increased risk of getting skin cancer. For 
this reason we have previously investigated single nucleotide polymorphisms in one 
DNA repair gene XPD for association with risk of skin cancer in a cohort of Cauca- 
sian Americans, and found that one allele of the sequence polymorphism called 
10 XPDe6 was associated with a moderately increased risk of getting basal cell carci- 
noma, the most common form of skin cancer. Later other groups have studied the 
association between sequence polymorphisms in this and other DNA repair genes 
and various forms of cancer. Some have reported positive results. 

15 Very little is known about the function of the gene RAI. It was cloned because its 
protein product binds to and Inhibits RelA of the transcription regulator NF-kappaB. 

Summary of the invention 

20 The present Invention relates in a first aspect to a group of nucleic acid sequences 
found to be associated with cancer. The invention further relates to transcriptional 
and translators! products of said sequence. An allele in the r region can be identi- 
fied as correlated with an increased risk of developing cancer, the prognosis of de- 
veloped cancer, and responsiveness to cancer treatment on the basis of statistical 

25 analyses of the incidence of a particular allele in individuals diagnosed with cancer. 

Thus, In a first aspect the invention relates to a method for estimating the cancer risk 
of an individual comprising 

30 - providing a sample from said Individual, 

- assessing in the genetic material including human genes in said sample a 
quence polymorphism 

35 - in a region corresponding to SEQ ID NO: 2. or a part thereof, or 
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- in a region complementary to SEQ ID NO: 2, or a part thereof, or 

- in a transcription product from a sequence in a region corresponding to SEQ 
ID NO: 2, or a part thereof, or 

- or translation product from a sequence in a region corresponding to SEQ ID 
NO: 2, or a part thereof. 

- obtaining a sequence polymorphism response, 

- estimating the cancer risk of said Individual based on the sequence polymor- 
phism response. 

Preferably the invention relates to a method for estimating the cancer risk of an indi- 
vidual comprising 

- providing a sample from said individual, 

- assessing In the genetic material including human genes in said sample a se- 
quence polymorphism 

- in a region corresponding to SEQ ID NO: 1 , or a part thereof, or 

- in a region complementary to SEQ ID NO: 1 , or a part thereof, or 

- In a transcription product from a sequence in a region corresponding to SEQ 
ID NO: 1, or a part thereof, or 

- or translation product from a sequence in a region corresponding to SEQ ID 
NO: 1 , or a part thereof, 

- obtaining a sequence polymorphism response, 

- estimating the cancer risk of said individual based on the sequence polymor- 
phism response. 

The estimation of the cancer risk of an individual can involve the comparison of the 
number and/or kind of polymorphic sequences identified with a predetermined can- 
cer risk profile. Such a profile can be based on statistical data obtained for a refe- 
vant reference group of Individuals. 
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. The sequence of the r region is set forth as SEQ ID N0 1 , originating from the clon- 
ing of human chromosome 19q published as part of the contig NTJVM 109 in the 
database of human sequences established by National Center for Biotechnology 
information and located on the internet at 

5 hbtp ; / /www . ncbi . nlm . nih . gov/genome/gui.de /human/ 

The presence of an allele is determined by determining the nucleic acid sequence of 
all or part of the region according to standard molecular biology protocols well 
known in the art as described for example In Sambrook et al. (1989) and as set forth 
10 In the Examples provided herein or products of the nucleic add sequences. 

In particular, the nucleic acid molecules of the present invention represent in a first 
aspect nucleic acid sequences forming part of (he region r corresponding to position 
1522-37752 of SEQ ID NO: 1. and preferably to certain nucleic add sequences 
15 within the gene referred to herein as RAI. As demonstrated In the Examples pre* 
sented below, the RAI gene Is in particular associated with human cancer diseases. 

Furthermore, the invention relates to a method for estimating the cancer prognosis 
of an individual comprising 



20 



25 



- providing a sample from said individual, 

- assessing in the genetic material including human genes in said sample a 
quence polymorphism 



- in a region corresponding to SEQ ID NO: 1 or SEQ ID NO: 2, or a part 
thereof, or 

- in a region complementary to SEQ ID NO: 1 or SEQ ID NO: 2, or a part 
thereof, or 

30 - in a transcription product from a sequence In a region corresponding to SEQ 

ID NO: 1 or SEQ ID NO: 2, or a part thereof, or 

- or translation product from a sequence in a region corresponding to SEQ ID 
NO: 1 or SEQ ID NO: 2, or a part thereof, 

- obtaining a sequence polymorphism response, 

35 
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- estimating the cancer prognosis of said Individual based on the sequence poly- 
morphism response. 

The estimation of the cancer prognosis of an individual can involve the comparison 
5 of the number and/or kind of polymorphic sequences identified with a predetermined 
cancer prognosis profile. Such a profile can be based on statistical data obtained for 
a relevant reference group of individuals. 

Additionally provided is a method of identifying a human subject as having an in- 
to creased likelihood of responding to a treatment comprising a) correlating the pres- 
ence of an r region allele genotype with an increased likelihood of responding to 
treatment; and b) determining the r region allele genotype of the subject, whereby a 
subject having an r region allele genotype correlated with an increased likelihood of 
responding to treatment is Identified as having an Increased likelihood of responding 
15 to treatment. 

Thus, the present invention also relates to method for estimating a treatment re- 
sponse of an individual suffering from cancer to a cancer treatment, comprising 

20 - providing a sample from said individual. 

- assessing in the genetic material including human genes In said sample a se- 
quence polymorphism 

25 - in a region corresponding to SEQ ID NO: 1 or SEQ ID NO: 2, or a part 

thereof, or 

- In a region complementary to SEQ ID NO: 1 or SEQ ID NO: 2, or a part 
thereof, or 

- in a transcription product from a sequence in a region corresponding to SEQ 
30 ID NO: 1 or SEQ ID NO: 2, or a part thereof, or 

- or translation product from a sequence in a region corresponding to SEQ ID 
NO: 1 or SEQ ID NO: 2, or a part thereof, 

- obtaining a sequence polymorphism response. 
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- estimating the individual's response to the cancer treatment based on the se- 
quence polymorphism response. 

The estimation of the individual's response to cancer treatment can involve the 
5 comparison of the number and/or kind of polymorphic sequences identified with a 
predetermined cancer treatment response profile. Such a profile can be based on 
statistical data obtained for a relevant reference group of individuals. 

The invention also comprises primers or probes for use in the invention, as well as 
10 kits including these. The primers and/or probes are preferably capable of hybridising 
to SEQ ID NO:1 or SEQ ID NO: 2, or a part thereof, in particularly the r region, or a 
part thereof, under stringent conditions. 

Furthermore, the invention also relates to cloning vectors and expression vectors 
15 containing the nucleic acid molecules of the invention, as well as hosts which have ■ 
been transformed with such nucleic acid molecules, including ceils genetically engi- 
neered to contain the nucleic acid molecules of the Invention, and/or cells geneti- 
cally engineered to express the nucleic acid molecules of the invention. The nucleic 
acids are preferably isolated form the r region and preferably contain one or more 
20 sequence polymorphisms as described herein below in more detail. In addition to 
host cells and cell lines, hosts also Include transgenic non-human animals (or prog- 
eny thereof). 

In particular, the present invention is based on the discovery of the correlation with 
25 single nucleotide polymorphisms (SNPs) and/or tandem repeats in the r region and 
cancer. Thus, SNPs have been found in the r region as shown in table 1. However, 
the present invention Is not limited to the SNPs shown in table 1, but does include 
any SNP in the region. Tandem repeats have been found in the r region as shown In 
table 2. However, the present invention is not limited to the tandem repeats shown 
30 in table 2, but does include any tandem repeat in the region. 

The term human includes both a human having or suspected of having a cancer 
disease and an a-symptomatic human who may be tested for predisposition or sus- 
ceptibility to cancer. At each position the human may be homozygous for an allele or 
35 the human may be a heterozygote. 
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Drawings 

Fig. 1 shows a subregion of chromosome 19q 

5 

Fig. 2 shows odds ratios and p-values for Individual sequence variations in relation 
to risk of basal cell carcinoma 

Fig. 3 shows odds p-values for association of different sequence variations with risk 
10 of basal cell carcinoma among psoriatic Danes 

Fig. 4 shows regions S1 , S2 and S3 of SEQ ID NO: 2. 

Detailed description of the invention 

15 

The present invention relates to a characterization of a person's present and/or fu- 
ture risk of getting certain forms* of cancer. The characterization is based on the 
analysis of sequence polymorphisms in a region of chromosome 19q in the person. 

20 A number of polymorphisms in the chromosomal region 19q1 3.2-3 have been iden- 
tified and characterised. Surprisingly, the sequence polymorphisms with strongest 
association to disease appeared to be located outside XPD. More specifically, the 
sequences were located In a sub-region between XPD and ERCC1, and seemed to 
have a maximum in or around the gene RAI (See Example 1). For persons getting . 

25 their skin cancer relatively early (before 50 yea** of age), it was found that predic- 
tions got better (Example 2) and when two sequence polymorphisms in RAI were 
combined, the prediction of earfy skin cancer got even better (Example 3). It was 
also possible to combine sequenoe polymorphisms in RAI with sequence polymor- 
phisms outside the region and get highly positive results (Example 4). 



30 



The region of chromosome 19q, more precisely the region located In 19q13.2-3, with 
which the present invention (s concerned, is depicted in Figure 1 as it is presently 
known together with the presently known or suspected genes. The arrows indicate 
the directions of transcription of the genes. The absolute chromosome positions 
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shown are from the particular build of NCBI's map of chromsome 19. and will proba- 
bly change with time. 

The region s stretches from the XPD gene to approximately the end of ERCC1 and 
5 includes the region r and is defined by SEQ ID NO: 2. in the present context the 
region s means SEQ ID NO: 2 end complementary sequence as well as transcrip- 
tional products and translational products thereof. 

One preferred section of the region s is S1 as shown In Fig. 4, more preferred S2 as 
10 shown in Fig. 4. most preferred $3 a$ shown In Fig. 4. 

The region r stretches from the beginning of. but not includingf the XPD gene, to 
approximately the end of ERCC1 and includes the genes RAI t LOC1 62978. and 
ASE-1. More specifically r is bounded by and includes the following two sequences: 
15 AGAACCCCCG CCCCTCCACC TCGTCTCAAA and TCCCTCCCCA GA- 
GACTGCAC CAGCGCAGCC, and is defined by SEQ ID NO: 1. 

In the present context the region r means SEQ ID NO: 1 and complementary se- 
quence as well as transcriptional products and translational products thereof. 

20 

One preferred section of the region r stretches approximately from the end of RAI to 
the end of ASE-1 and Includes the genes RAI, LOC1 62978. and ASE-1. More spe- 
cifically, this section of i is bounded by and includes the following sequences: 
GAAGTGAGCC AAGATCACGC CACTGCACTC and GTGCCCACCT GGGCCAC- 
25 CAG AAGGTGACAC. In the present context the region r means SEQ ID NO: 1 
bases 1522-37752 and complementary sequence as well as transcriptional products 
and translational products thereof. 
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Finally, In the claims the gene RAJ Is defined as including transcribed sequences of 
the gene plus a 1500 base upstream promoter region. More specifically RAI Is 
bounded by and Includes the following sequences: CATAACCACA ATGATGAGCA 
TGTATTGAGT and ATGTTGTCCA GGCTGGTCTT GAACTCCTGA. In the present 
5 context this section of the region r relates to SEQ ID NO: 1 bases 7761-22885 and 
complementary sequence as well as transcriptional products and translational prod- 
ucts thereof. 



Modifications to the human genome map are known to occur from time to time. It is 
10 therefore possible that the defining sequences quoted above will change slightly in 
future maps. 

Fragments or parts of the region s or r as used herein relates to any fragment of at 
least 100 nucleic acid redues in length, or mutiples of 100 nucleic add residues in 

• 15 length, starting from SEQ ID NO: 1 position 1, 100, 200, 300, 400, 500, 600. 700. 

800, 900. 1000. 1100.- 1200. 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 
2100, 2200, 2300, 2400, 2500, 2600, 2600, 2700, 2800, 2900. 3000, and so forth, 
each fragment starting position having an increment of 100 nucleic acid residues. 
Multiples are preferably multiples of e.g. 1, 2, 3. 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,i14, 

20 15. 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30. 31, 32, 33. 34. 35, 36, 
37, 38, 39, 40, 41. 42, 43, 44, 45, 46, 47, 48. 49 and SO. 



For fragments starting ai position '1, the length of said fragments will thus be e.g. 
100, 200, 300. 400, 500, 600, 700, 800. 900, 1000, 1100, 1200, 1300, 1400, 1500, 
25 1600, 1700, 1800. 1900, 2000, 2100, 2200, 2300. 2400. 2500, 2600, 2600, 2700, 
2800, 2900, 3000. and so forth, using suitable multiplicators as listed herein above. 

For fragments starting at position 100, the length of said fragments will thus be e.g. 
100. 200. 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200. 1300, 1400. 1500. 
30 1600, 1700, 1800. 1900. 2000, 2100, 2200, 2300. 2400. 2500, 2600, 2600, 2700, 
2800, 2900. 3000, and so forth, using suitable multiplicators as listed herein above. 

For fragments starting at position 7700, the length of said fragments will thus be e.g. 
100, 200, 300, 400. 500, 600, 700, 800, 900, 1000, 1100. 1200. 1300, 1400, 1500, 
35 1600, 1700, 1800, 1900, 2000. 2100, 2200, 2300, 2400, 2500, 2600, 2600, 2700, 
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2800. 2900, 3000, 3500, 4000, 4500, 5000. 5500. 6000, 6500, 7000, 7500. 8000, 
8500, 9000, 9500, 10000, 10500. 11000, 11500. 12000, 12500, 13000, 13500. 
14000, 14500, 15000, and so forth, using suitable multiplicators such as e.g. the 
ones listed herein above. 

5 

The nucleic acid sequences according to the present invention makes it possible to 
estimate cancer risk in an individual by using sequence polymorphisms originating 
from a specific region of chromosome 19. 

1 0 Estimation of cancer risks has a number of important applications: 

(1) Individuals with reasons to suspect that they are at risk for getting cancer would 
be able to clarify their situation and, if possible, take protective action. Alternatively, 
anti-cancer campaigns, companies, hospitals or other institutions could offer a serv- 

15 ice to help people Clarify their situation. It would for instance be possible to test per- 
sons, when they got their first basal cell carcinoma, which is often recurrent and also 
is a moderate predictor for other cancers. If the persons were in a high-risk group, 
• one could then advice them about, or they could of their own accord choose, risk- 
reducing behaviour, such as avoidance of excessive sun-exposure, abstaining from 

20 smoking etc. About 5 percent of the Danish population will at some point in their life 
get a basal cell carcinoma. 

(2) Anti-cancer campaigns, companies, hospitals or other institutions would be able 
to define relevant target subpopuiatlons and focus information on risk-reducing be- 

25 haviour on these persons. They might perhaps also be in a position to inform the 
remainder of the population that they need not worry. Lung cancer affects approxi- 
mately 10-15 percent of smokers and thus approximately 5 percent of the popula- 
tion, somewhat varying from country to country. Malignant melanoma, a sun- 
induced, often lethal form of skin cancer, affects approximately 700 persons a year 

30 in Denmark or about 1 percent of the Danish population. 

(3) The drugs used in cancer treatment are often carcinogenic themselves and indi- 
vidual responses to them vary considerably, both with respect to tolerance to the 
treatment and with respect to efficacy of the treatment. It Is an obvious possibility 

35 that the region of chromosome 19 here dealt with, which contains DNA repair genes 
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known to modulate carcinogen responses, also modulates response to anti-cancer 
agents. Hence, analysis of the region may facilitate better choices of treatment for 
cancer, and/or help predict the future course of disease. 

By sequence polymorphism is understood any single nucleotide, tandem repeat, 
Insert, deletion or block polymorphism, which varies among humans, whether K is of 
biological Importance or not. 

Position of sequence polymorphism in the region s and r 

in one embodiment of the methods of the invention, preferably the method for diag- 
nosis as described herein, one or more single nucleotide polymorphism^) at a pre- 
determined position in the region r (SEQ ID NO:1) are identified and used for e.g. 
cancer risk profiling and/or cancer treatment response profiling. Presently preferred 
single nucleotide polymorphism^) are listed In Table 1. However, the present Invert- - 
tlon relates to any SNP in the r region. 



Table 1 

20 Identification in dbSNP' 
rs#31 38378 A/G 
rs#31 38376 QfT 

rs#209725 C/A ambigouous location 

rs#2377328 C/T 
25 rs#6966 A/T 

rs#2017154 AJC 

rs#2017104 A/6 

rs#2070830 T/6 

re#1 070764 A/G 
30 re#2226949 T/G 

rs#9S94S7 C/T 

rs#2336218 C/A 

rs#766934 A/G 

rs#928911 C/T 
35 rs#1005165 ex- 



position in SEQ ID NO: 1 

137 

235 

7199 

7887 (=RAIeS) 
12115 
12190 
14575 

15798 ("RAM) 

32035 

32446 

32447 

32481 

32785 

33974 




25/02 2003 15:10 PAX 33320 
P687DK02 



* 



HOIBERG A/S 



13 



rs#l005166 C/T 
rs#967591 A/G 
rs#1046282 T/C 
rs#2013521 A/T 
re#735482 A/C 
ps#762562 A/G 

rs#2336919 ambiguous location 
re#743571 C/G 



(3ioi$ 



34119 

34858 (=ASE-lel) 

35S98 

36254 

36926 

37267 

37786 
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1 dbSNP is the database over SNPs established by the National Center for Biotech- 
nology Information and located on the internet at http://www.ncbi.nlm.nih.aov/SNP/ . 

In another embodiment of the invention preferably the method for described herein 
is one in which the tandem repeat is at a position as described in Table 2: 

Table 2 



Identification In unlSTS 2 



20 D19S908 

STS-W67936 

D19S543 

D19S393 

STS-R46186 
25 GDB:161S15 

RH47033 

GDB-.1Q0019 

2 UniSTS is a database of unique sequence tag sites established by National Center 
for Biotechnology Information and located on the internet at 

30 hfcfcpi / /www, ncbi.nlm.nih.gov;' entreat query. fcpi?db"uniats 

In another embodiment of the Invention, the method for diagnosis described herein 
is preferably one in which the sequence polymorphism is in region r. Testing for the 
presence of the RAI gene allele is especially preferred because, without wishing to 
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be bound by theoretical considerations, of its association with Increased risk of can* 
cer (as explained herein). 

In one embodiment of the methods of the invention, preferably the method for dlag- 
5 nosls as described herein, one or more single nucleotide polymorphism^) at a pre- 
determined position In the region s (SEQ ID NO:2) are identified and used for e.g. 
cancer risk profiling and/or cancer treatment response profiling. Presently preferred 
polymorphism(s) are the four base pair deletion shown in Fig. 4 corresponding to 
TGTC. However, the present invention relates to any polymorphism and SNP in the 
10 s region. 

The sequence polymorphism of the invention comprises at least one base differ- 
ence, such as at least two base differences. As described above the sequence poly- 
morphism comprises at least one single nucleotide polymorphism, such as at least 
15 two single nucleotide polymorphisms. Also, the sequence polymorphism comprises 
at least one tandem repeat polymorphism, such as at least two tandem repeat poly- 
morphisms. 

Also, the sequence polymorphism may be a combination of single nucleotide poly? 
20 morphfsm and tandem repeats. 

The status of the individual may be determined by reference to allelic variation at 
one. two. three, four or more of the above cod. 

25 Cell sample 

The cell sample used in the present invention may be any suitable cell sample ca- 
pable of providing the genetic material for use In the method. In a preferred em- 
bodiment, the cell sample is a blood sample, a tissue sample, a sample of secretion, 
30 semen, ovum, a washing of a body surface (e.g. a buccal swap), a clipping of a 
body surface (hairs, or nails), such as wherein the cell is selected from white blood 
cells and tumour tissue. 

It will be appreciated that the test sample may equally be a nucleic acid sequence 
35 corresponding to the sequence in the test sample, that Is to say that all or a part of 
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me region in the sample nucleic add may firstly be amplified using any convenient 
technique e.g. PGR, before use in the analysis of variation In the region. 

Detection methods 

5 

Detection may be conducted on the sequence of , SEQ ID NO: 1, SEQ ID NO: 2 or 
a complementary sequence as well as on translations! (mRNA) and transcriptional 
products (polypeptides, proteins) therefrom. 

10 It will be apparent to the person skilled in the art that there are a large number of 
analytical procedures which may be used to detect the presence or absence of vari- 
ant nucleotides at one or more of positions mentioned herein in the r region. Muta- 
tions or polymorphisms within or flanking the r region can be detected by utilizing a 
number of techniques. Nucleic acid from any nucleated oell can be used as the 

15 starting point for such assay techniques, and may be isolated according to standard 
nucleic acid preparation procedures that are well known to those of skill in the art In 
general, the detection of allelic variation requires a mutation discrimination tech- 
nique, optionally an amplification reaction and a signal generation system. Table 3 
lists a number of mutation detection techniques, some based on the PCR. These 

20 may be used in combination with a number of signal generation systems, a selection 
of which is listed in Table 4. Further amplification techniques are listed in Table 5. 
Many current methods for the detection of allelic variation are reviewed by Nollau et 
al., Clin. Chem. 43, 1114-1120, 1997: and In standard textbooks, for example M Labo- 
♦ ratory Protocols for Mutation Detection", Ed. by U. Landegren, Oxford University 

25 Press, 1996 and "PCR", 2.sup.nd Edition by Newton & Graham, BIOS Scientific 
Publishers Limited, 1 997. 

Table 3 

30 Abbreviations: 

ALEX .TM. Amplification refractory mutation system linear extension ' 

APEX Arrayed primer extension 

ARMS TM. Amplification refractory mutation system 

b-DNA Branched DNA 

35 CMC Chemical mismatch cleavage 
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bp 


base pair 




COPS . 


Competitive oligonucleotide priming system 




DGGE 


Denaturing gradient gel electrophoresis 




FRET 


Fluorescence resonance energy transfer 


5 


LCR 


Ugase chain reaction 




MAS DA 


Multiple allele specific diagnostic assay 




NASBA 


Nucleic acid sequence based amplification 




OLA 


Oligonucleotide ligation assay 




PGR 


Polymerase chain reaction 


10 


PTT 


Protein truncation test 




RFLP 


Restriction fragment length polymorphism 




SDA 


Strand displacement amplification 




SNP 


Single nucleotide polymorphism 




sscp 


Single-strand conformation polymorphism analysis 


. 15 


SSR 


Self sustained replication 




TGGE 


Temperature gradient gel electrophoresis 



©010 



Table 4 illustrates various mutation detection techniques capable of being used for 
20 SNP detection. 



25 



Table 4 

General techniques: DNA sequencing, Sequencing by hybridisation, SNAPshOt. 

Scanning techniques: PJT, SSCP. DOGE, TGGE, Cleavase, Heteruduplex analy- 
sis, CMC, Enzymatic mismatch cleavage 



30 



35 



Hybridisation Based techniques 

Solid phase hybridisation: Dot blots, MASDA, Reverse dot blots. Oligonucleotide 
arrays (DNA Chips) 

Solution phase hybridisation: Taqman TM.-U.S. Pat No. 5.210.015 & 5,487,072 
(Hoffmann-La Roche), Molecular Beacons-Tyagi et al (1996), Nature Biotechnol- 
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ogy, 14, 303; WO 95/13399 (Public Health Inst, New York), Ughtcyder, optionally in 
combination with FRET. 

Extension Based: ARMS.TM.. ALEX.TM.-Ewopean Patent No. EP 332435 B1 
5 (Zeneca Limited). COPS-Gibbs et a! (1989), Nucleic Acids Research, 17, 2347. 

Incorporation Based; Mini-sequencing. APEX 

Restriction Enzyme Based: RFLP. Restriction site generating PCR 

Ligation Based: OLA 

Other: Invader assay 



10 



1 5 Various Signal Generation or Detection Systems is listed below: 

Fluorescence: FRET, Fluorescence quenching, Fluorescence polarisation-United 
Kingdom Patent No. 2228998 (Zeneca Limited) 

20 Other. Chemiluminescence, Electrochemilumlnescence, Raman, Radioactivity, Col- : 
orimetric Hybridisation protection assay. Mass spectrometry 



Table 6 illustrates examples of further amplification techniques. 

25 

Table 5 

SSR, NASBA. LCR. $DA, b-DNA 

30 Preferred mutation detection techniques Include ARMS.TM.. ALEX.TWL, COPS. 
Taqman. Molecular Beacons. RFLP, and restriction site based PCR and FRET 
techniques. 

Particularly preferred methods include FRET; taqman, ARMS.TM. and RFLP based 
35 methods. 
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In a preferred embodiment, mutations or polymorphisms can be detected by using a 
mlcroassay of nucleic acid sequences immobilized to a substrate or "gene chip" 
(see, e.g. Cronin, et al„ 1996, Human Mutation 7:244-255). 

5 

Further, improved methods for analyzing DNA polymorphisms, which can be utilized 
for the Identification of region r specific mutations, have been described that capital- 
ize on the presence of variable numbers of short, tandemty repeated DNA sequen- 
ces between the restriction enzyme sites. For example, Weber (U.S. Pat No. 

1D 5,075,217) describes a DNA marker based on length polymorphisms In blocks of 
(dOdA)n-(dG^dT)n short tandem repeats. The average separation of (dC-dA)rKdG- 
dT)n blocks Is estimated to be 30,000-60,000 bp. Markers that are so closely 
spaced exhibit a high frequency co-inheritance, and are extremely useful in the 
identification of genetic mutations, such as, for example, mutations within the RA1 

15 gene, and the diagnosis of diseases and disorders related to RAI mutations. 

Also, Caskey et al. {U.S. Pat. No, 5,364,759) describe a DNA profiling assay for 
detecting short tii and tetra nucleotide repeat sequences. The process includes ex- 
tracting the DNA of interest, such as the RAI gene, amplifying the extracted DNA, 
20 and labelling the repeat sequences to form a genotypic map of the individual's DNA. 

The level of RAI gene expression can also be assayed. For example, RNA from a 
cell type or tissue known, or suspected, to express the RAI gene, such as brain, 
may be isolated and tested utilizing hybridization or PCR techniques such as are 

25 described, above. The isolated cells can be derived from cell culture or from a pa- 
tient The analysis of cells taken from culture may be a necessary step in the as- 
sessment of cells to be used as part of a cell-based gene therapy technique or, al- 
ternatively, to test the effect of compounds on the expression of the RAI gene. Such 
analyses may reveal both quantitative and qualitative aspects of the expression 

30 pattern of the RAI gene, including activation or inactivatlon of RAI gene expression. 

In one embodiment of such a detection scheme, a cDNA molecule is synthesized 
from an RNA molecule of interest (e.g., by reverse transcription of the RNA mole- 
cule into cDNA). A sequence within the cDNA is then used as the template for a 
35 nucleic acid amplification reaction, such as a PCR amplification reaction, or the I 
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The nucleic add reagents used as synthesis Initiation reagents (e.g., primers) in the 
reverse transcription and nucleic acid amplification steps of this method are chosen 
from among the RAI gene nucleic acid reagents described above. Hie preferred 
lengths of such nucleic acid reagents are at least 9-30 nucleotides. For detection of 
5 the amplified product, the nucleic acid amplification may be performed using radio- 
actively or non-radloactively labeled nucleotides. Alternatively, enough amplified 
product may be made such that the product may be visualized by standard ethidium 
bromide staining or by utilizing any other suitable nucleic acid staining method. 

10 Additionally* it is possible to perform such RAI gene expression assays "In situ", i.e., 
directly upon tissue sections (fixed and/or frozen) of patient tissue obtained from 
biopsies or resections, such that no nucleic acid purification is necessary. Nucleic 
acid reagents such as those described above may be used as probes and/or prim- 
ers for such In situ procedures (see. for example, Nuovo, G. J., 1992, °PCR In Situ 

1 5 Hybridization: Protocols And Applications", Raven Press, NY). 

Alternatively, If a sufficient quantity of the appropriate ceils can be obtained, stan- 
dard Northern analysis can be performed to determine the level of mRNA expres- 
sion of the RAI gene. 



20 



Activity of the gene 



Another method for detecting sequence polymorphism is by analysing the activity of 
gene products resulting from the sequences. Accordingly, in one embodiment the 

25 detection uses the activity of the RAI gene product as compared to a reference in 
the method. In particular if the activity of the genes are decreased or increased by at 
least or about 50 %, such as at least or about 40%, for example at least or about 
30%, such as at least or about 20%, for example at least or about 10%, such as at 
least or about 10%, for example at least or about 5%, such as at least or about 2%, 

30 it Indicates a sequence polymorphism in the gene. 

Mutations outside the region 
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The present invention may combine the result of sequence polymorphism within the 
region r or s with sequence polymorphism outside the region in order to increase the 
probability of the correlation. 

5 Primers 

The primers nucleotide sequences of the invention further include: (a) any nucleo- 
tide sequence that hybridizes to a nucleic add molecule of the region r or its com- 
plementary sequence or RNA products under stringent conditions, e.g., hybridization 

10 to filter-bound DNA In 6x sodium chloride/sodium citrate (5SC) at about 45°C fol- 
lowed by one or more washes in 0.2x SSC/0.1 % SDS at about 50-65 Q C, or (b) under 
highly stringent conditions, e.g. ( hybridization to filter-bound nucleic acid in 6x SSC 
at about 45°C followed by one or more washes in 0.1x SSC/0.2% SDS at about 
68°C, or under other hybridization conditions which are apparent to those of skill in 

15 the art (see, tor example, Ausubel F.M. et al„ eds„ 1989, Current Protocols In Mole- 
cular Biology, Vol, I, Green Publishing Associates, Inc., and John Wiley & sons. Inc., 
New York, at pp. 6.3.1-6.3.6 and 2.10.3). Preferably the nucleic acid molecule that 
hybridizes to the nucleotide sequence of (a) and (b). above, is one that comprises 
the complement of a nucleic acid molecule of the region r or a complementary se- 

20 quence or RNA product thereof, in a preferred embodiment, nucleic add molecules 
comprising the nucleotide sequences of (a) and (b). comprises nucleic acid mole- 
cule of RAI or a complementary sequence or RNA product thereof. 

Among the nucleic acid molecules of the invention are deoxyoligonucleotides foli- 
25 gos") which hybridize under highly stringent or stringent conditions to the nucleic 
acid molecules described above. In general, for probes between 14 and 70 nucleo- 
tides in length the melting temperature (tm) is calculated using the formula: 



30 



Trn(*C)=81.5+16.6(tog [monovalent cations (molar)])+0.41(% G+CM500/N) 

where N is the length of the probe. If the hybridization is carried out In a solution 
containing formamide, the melting temperature is calculated using the equation 
Tm(°C)=81.5+16.6(log[monovalent cations (molar)])+0.41(% G+C)-(0.61% formam- 
lde)-(500/N) where N is the length of the probe. In general, hybridization is carried 
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out at about 20-25 degrees below Tm (for DNA-DNA hybrids) or 10-15 degrees be- 
low Tm (for RNA-DNA hybrids). 

Exemplary highly stringent conditions may refer, e.g., to washing in 6x SSC/0.05% 
5 sodium pyrophosphate at 37°C (for about 14-base oligos), 48° C (for about 17-base 
ollgos), 55°C (for about 20-base oligos), and 60°C (for about 23-base oligos). 

Accordingly, the invention further provides nucleotide primers or probes which de- 
tect the r region polymorphisms of the Invention. The assessment may be conducted 
10 by means of at least one nucleic acid primer or probe, such as a primer or probe of 
DNA, RNA or a nucleic acid analogue such as peptide nucleic acid (PNA) or locked 
nucleic acid (LNA). The nucleotide primer or probe is preferably capable of hybrid- 
ising to a subsequence of the region corresponding to SEQ ID NO: 1 , or a part 
thereof, or a region complementary to SEQ ID NO:1. 

15 

According to one aspect of the present invention there is provided an allete-specific 
oligonucleotide probe capable of detecting a r region polymorphism at one or more 
of positions in'the r region as defined by the positions in SEQ ID NO: 1. 

20 The allele-specific oligonucleotide probe is preferably 5-50 nucleotides, more pref- 
erably about 5-35 nucleotides, more preferably about 5-30 nucleotides, more pref- 
erably at least 9 nucleotides. 

The design of such probes will be apparent to the molecular biologist of ordinary 
25 skill. Such probes are of any convenient length such as up to 50 bases, up to 40 
bases, more conveniently up to 30 bases in length, such as for example 8-25 or 8- 
15 bases in length. In general such probes will comprise base sequences entirely 
complementary to the corresponding wild type or variant locus in the region. How- 
ever, if required one or more mismatches may be introduced, provided that the dis- 
30 criminatory power of the oligonucleotide probe is not unduly affected. The probes of 
the Invention may carry one or more labels to facilitate detection. 

In one embodiment the primers and/or probes are capable of hybridizing to a sub- 
sequence selected from the group of subsequences below, wherein the polymor- 
35 phism is denoted as for example T/C: 
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1 . GCTCTOAAAC TTACTAGCCC<A/G)GTATTTATGG AGAGGCATTT 

2. GTGGTCAAAT TCTCATTCAT CGTGG (T/C) CCAGGCAAGC 
ACACTTCCTC 

5 3. ACCCTGAGGT GAGCACCTGT TCCTT(C/T) TCCTTGCCCT TAGCCCA- 

GAG GTAGA 

4. GGGCAGGGGT TTGTGCCTCC AATGA (G/A) CACAAGCTCC 
CCCTGCCCCC CAACT 

5. CCTGGCGGTG GCCGTCACCA GCTTT (T/C) GGGGGTGTTT 
10 GGGAAGCTGG 

6. CTCGAGCCCC ACTGTTCCCT (A/G) GGCCCTATTG GTCCCCCTGG 
. 7. ACAAGGAGG A G6GAGAAGTG AGGTT (G/C) AAACCCACTG CCCAATC- 

* TTA 

8. CCAACACGGT GAAACCCCGT CTGTA(T/C)TAAAAATACA AAAATTAGCC 
15 9- AATCCAGGAC CCCATAATCT TCCGT (C/T) ATCTAAAACA ATA- 

ATGGTGA 

1 0. CCCAAGGGGG CGAGGGGAGG GTGAA (A/G)GGGTGGGACG . 
GGGGCAGCCG 

11. GAAGTGAGAA GGGGGCTGGG GGTCG (G/-) CGQTCGCTAG 
20 CGGGCGCGGG 

12. CGCACGCGCA GTATCCCGAT TGGCT (C/G)TGCCCTAGCG GATT- 
GACGGG 

13. AACTCCTGGG TTCGATCAAT ACTCA (GAGA/-) ATCTTGGCAG 
GCGCAGGAGG 

25 14. GCTGGGATTA CAGGCTTGAG CCACC (A/G) CGCCCGGCCT 

£ GCAAAGCCAT 

15. TTTTGTATCT TTAGTAGAGA CAGG (T/G) TTTCTCCATG TTGGTCAGGC 

16. GCCTCAGCCT CCCGAGTAGC TGAGACT (C/A) CAGGTGCCCG CCAC- 
CACGCC 

30 17. TGAAATTGTA GGTTGAGAGG CCAGGCG (C/T) GGTGCTCACG 

CCTGTAATTT 

18. GTTTATAAAC ATTAAACCAG (T/A) GCTGTGTGAA GGCACTTAAT 

19. CCGTCTCTAT TAAAAATATA AAA (A/C) AATTTAGCCG GGTGTAGCGG 

20. GGGAGGCTCG AGGCGGGC (A/G) GATTGCATGA GCTCAGGATT 
35 21 . TCCCAAGTTT CAGGGCCCAA (T/G) ATTCTCAAAT CACAGGATTC 
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22. TGCAGTGAGC TGAGATCGC (A/G) CCACTGCACT CCAGCCTGGG 

23. TCTTAGGACG CATGGGGGT (T/G) GAGAGAACGG GGAGATAGAC 

24. CTGGGTTCTA GAACTACC (C/T) ATGCAAACCC AGCTGTTTCC 

25. ATTCTGCCCT GGGTTCTAGA ACTACCT (C/A) TGCAAACCCA 
5 GCTGTTTCCC 

26. GCTGTTTCCC ACCCCATAAG GpA (A/G) TAGGGGAGCC 
CACCTCCGCC 

27i GACCTAGAAG ATCGGTCGAG A (C/T) AGCAGCTTGA GGCTGGCAGG 

28. CTGGCCAGGA ATGCAGTCGG GTCAC (C/T) CTGTCTAGCC 
10 ACCGTCTCGC 

29. GGGAGGAGTC GCCGATCAGG (C/T) CCCTTCCTGA AAGTCATCGA 

30. GCAGCCCGGG CTACAGGGTT (A/G) CCTGAGGTGT GGGTCCCAGG 

31 . TAGAAATACT AACAAAGGGC (T/C) GTGGGTTTCT CCCCCTGCTT 
" 32. ACAGGAGAGG GAAG G I 1 1 I I IQ (A/T) 1 1 1 1 1 1 1 III Gl 1 1 1 1 111 I 

15 33. GAAGAGGAAG AAGCCCAAAG GGA (A/C) AGAAACCTTC GAGCCA- 

GAAG 

34. GCGCCTCAAC AGCCAGAAGG AGCG (A/G) AGCCTCAGGC CCAGG- 
CAGCT 

35. TTGAGACTCT CTGTTTGAT (A/G) CTTCACTCAG AAGGTGCTTC 
20 36. AGGCCAGGCT CCTGCTGGCT G (C/G) GCTGGTGCAG TCTCTGGGGA 

37. CCCCTATACC CTCAAGCAT (C/T) TATCCATTGA GTTACAAACA 

38. ACCATCCCCC GCCTTCCGTT (A/C) GTCCGGCCCC CGAGGCTAGC 



25 In another embodiment, the primers and/or probes are capable of hybridizing to a 
subsequence selected from the group of subsequences below. 

1. TGAAATTGTA GGTTGAGAGG CCAGGCG (C/T) GGTGCTCACG 
CCTGTAATTT 

30 2. GTTTATAAAC ATTAAACCAG (T/A) GCTGTGTGAA GGCACTTAAT 

3. CCGTCTCTAT TAAAAATATA AAA (A/C) AATTTAGCCG GGTGTAGCGG 

4. GGGAGGCTCG AGGCGGGC (A/G) GATTGCATGA GCTCAGGATT 

5. TCCCAAGTTT CAGGGCCCAA (T/G) ATTCTCAAAT CACAGGATTC 

6. TGCAGTGAGC TGAGATCGC (A/G) CCACTGCACT CCAGCCTGGG 
35 7. TCTTAGGACG CATGGGGGT (T/G) GAGAGAACGG GGAGATAGAC 
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8. CTGGGTTCTA GAACTACC (C/T) ATGCAAACCC AGCTGTTTCC 
6. ATTCTGCCCT GGGTTCTAGA ACTACCT (C/A) TGCAAACCCA 
GCTGTTTCCC 

10. GCTGTTTCCC ACCCCATAAG GCA (A/G) TAGGGGAGCC 
5 CACCTCCGCC 

11 . QACCTAGAAG ATCGGTCGAG A (C/T) AGCAGCTTGA GGCTGGCAGG 

12. CTGGCCAGGA ATGCAGTCGG GTCAC (C/T) CTGTCTAGCC 
ACCGTCTCGC 

13. GGGAGGAGTC GCCGATCAGG (C/T) CCCTTCCTGA AAGTCATCGA 
10 14. GCAGCCCGGG CTACAGGGTT (A/G) CCTGAGGTGT GGGTCCCAGG 

15. TAGAAATACT AACAAAGGGC (T/C) GTGGGTTTCT CCCCCTGCTT 

16. ACAGGAGAGG GAAGGI I 1 1 I IG (A/T) 1 1 1 I I I I I 1 1 Gl 1 I 1 1 1 1 1 I 

17. GAAGAGGAAG AAGCCCAAAG GGA (A/C) AGAAACCTTC GAGCCA- 
GAAG 

15 18. GCGCCTCAAC AGCCAGAAGG AGCG (A/G) AGCCTCAGGC CCAGG- 

CAGCT 

In yet another embodiment, the primers and/or probes are capable of hybridizing to 
a subsequence selected from the group of subsequences below 

20 

1. GTTTATAAAC ATTAAACCAG (T/A) GCTGTGTGAA GGCACTTAAT 

2. CCGTCTCTAT TAAAAATATA AAA (A/C) AATTTAGCCG GGTGTAGCGG 

3. GGGAGGCTCG AGGCGGGC (A/G) GATTGCATGA GCTCAGGATT 

4. TCCCAAGTTT CAGGGCCCAA (T/G) ATTCTCAAAT CACAGGATTC 
25 5. TGCAGTGAGC TGAGATCGC (A/G) CCACTGCACT CCAGCCTGGG 



It Is preferred in one embodiment that at least one sequence polymorphism is 
sessed In a region corresponding to SEQ ID NO: 1 position 1521-37752 ft), includ- 
ing at least one sequence polymorphism assessed in a region corresponding to 
30 SEQ ID NO: 1 position 7760-22885. 

In another embodiment, the methods of the Invention relates to at least one se- 
quence polymorphism is assessed In a region corresponding to SEQ ID NO: 1 posi- 
tion 34391-37683, ending with the coding region of ASE-1 (cagcctgtgtag), where tag 
35 is the stop codon. 
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In another embodiment, the method of the Invention relates to at least one sequence 
polymorphism assessed In a region corresponding to the S1 as shown in Fig. 4. 

5 In another embodiment the method of the Invention relates to at least one sequence 
polymorphism assessed in a region corresponding to the S2 as shown in Rg. 4. 

In another embodiment, the method of the Invention relates to at least one sequence 
polymorphism assessed in a region corresponding to the S3 as shown In Rg. 4. 
1 0 More particular the method of the invention relates to at least one sequence poly- 
morphism being a deletion assessed In a region corresponding to the S3 as shown 
in Rg. 4, more particular a 4 bssepair deletion in a region corresponding to the S3 
as shown in Rg. 4, even more particular a deletion of TGTC in S3 as shown in Rg. 
4. 

16 .... 

In a preferred embodiment the primers or probes are selected from one or more of 
the following: 

TGG.CTAACACGGTGAAACC(SEQ ID NO:7) 
20 GGAATCCAAAGATTCTATGATGG(SEQ ID NO:8) 

GGGAGGCGGAGCTTGCAGTGA (SEQ ID NO:9) 

CTGAGATCGCACCACTGCAC (SEQ ID NO:10) 

GGTTTTCTGCTCTGCACACG (SEQ ID NO:1 1) 

CCTTTCTCCTTCCACCAACG (SEQ ID NO:12) 
25 CGGGCTACAGGGTTACCTGAG (SEQ ID NQ:1 3) 

TCTGCAACCTGGTGCGAGCAGC (SEQ ID NO:14) 

CCTACCACCATCATCACATCC (SEQ ID NO:15) 

GCCTTGCCAAAAATCATAACC (SEQ ID N0:16) 

CCTCTCCCCAATTAAGTGCCTTCACACAGC (SEQ ID NO:17) 
30 AGCCAGGGAGGTTGAGGCT (SEQ ID NO:18) 

AGACAGCCCTGAATCAGCAC (SEQ ID NO:19) 

GCAATGAGCCGAGATAGAA (SEQ ID ND:20) 

TGGCTAGCCCATTACTCTA (SEQ ID NO:21) 
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According to another aspect of the present Invention there ia provided a diagnostic 
nucleic acid primer capable of detecting a r region polymorphism at one or more of 
positions in the r region as defined by the in SEQ ID NO: 1 or the $ region as de- 
fined by SEQ ID NO: 2. 

5 

The primer or probe may be a diagnostic nucleic acid primer defined as an allele 
specific primer, used, generally together with a constant primer, in an amplification 
reaction such as a PCR reaction, which provides the discrimination between alleles 
through selective amplification of one allele at a particular sequence position. The 
1 0 diagnostic primer is preferably 5-50 nucleotides, more preferably about 6-35 nudet>- 
tldes, more preferably about 5-30 nucleotides, more preferably at least 9 nucleo- 



In accordance with the present invention diagnostic primers am provided, compris- 
15 ing the sequences set out below as wed as derivatives thereof wherein about 6-8 of 
the nucleotides at the 3* terminus are identical to the sequences given below and 
wherein up to 10, such as up to 8, 6, 4, 2, or 1 of the remaining nucleotides may be 
varied without significantly affecting the properties of the diagnostic primer. Con- 
veniently, the sequence of the diagnostic primer is as written below. 

20 

Furthermore, as described above et least two sets of primer(s) and/or probe(s) may 
be combined in the method thereby increasing the comelation probability. This sec- 
ond or other set of printers) and/or probe(s) may be a nucleotide or nucleotide 
analogues hybridising to a region within the region r or to a sequence different from 
25 the region r. Said sequence different from the region r is preferably a region in 
chromosome 19, preferably rn chromosome 19q. In particular such second or other 
primer or probe may be selected from one or more of the sequences below, or the 
complementary strands: 

30 GCCCCGTCCCAGGTA (SEQ ID NO:21 ) 

AGCCCCAAGACCCTTTCACT {SEQ ID NO:22) 

GTCCCATAGATAGGAGTGAAAG (SEQ ID NO:23) 

CCCTAGGACACAGGAGCACA (SEQ ID NO:24) 

TTGTGCTTTCTCTGTGTCCA (SEQ ID NO:25) 
35 TATCAGAAAA6GCTGGAGGA (SEQ ID NO:26) 
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GAGTGGCTGGGGAGTAGGA (SEQ ID NO:27) 

GCCAAGCAGAAGAGACAAA (SEQ ID NO:28) 

CCTCAGATGTCCTCTGCTCA (SEQ ID NO:29) 

GCCACAGCCCCAGCAAGTAG (SEQ ID NO:30) 
5 AGGACCACAGGACACGCAGA (SEQ ID NO:31 > 

CATAGAACAGTCCAGAACAC (SEQ ID NO:32) 

TTAGCTTGGCACGGCTGTCCAAGGA (SEQ ID NO:33) 

ACAGAATTCGCCCCGGCCTGGTACAC (SEQ ID NO:34) 

TTGAAACTGGAACTCTGAGAAGG (SEQ ID NO:35) 
1 0 TGGTGG ATGGTGTGAAGCA (SEQ ID NO:36) 

CCTTTCTCCAACTTCTTCTCCATTTCCACC (SEQ ID NO:37) 

GGGGATCATGTCGTCAATGGACT (SEQ ID NO:38) 

ATGCCCTGTAGGTTCAATGG (SEQ ID NO:39) 

TGGAGGTCTTTAGGGGCTTG (SEQ ID NO:40) 
15 GGCTGGTCCCCGTCTTCTCCTTCC (SEQ ID NO:41) 

TCTCTGTTGCCACTTCAGCCTC (SEQ ID NO:42) 

GTCCTGCCCTCAGCAAAGAGAA (SEQ ID NO:43) 

TTCTCCTGCGATTAAAGGCTGT (SEQ ID NO:44). 

ATCCTGTCCCTACTGGCCA7TC (SEQ ID NO:45) 
20 TGTGGACGTGACAGTGAGAAAT (SEQ ID NO:46) 

TGGAGTGCTATGGCACGATCTCT (SEQ ID NO:47) 

CCATGGGCATCAAATTCCTGGGA (SEQ ID NO:48) 

CACACCTGGGTCATTTTTGTAT (SEQ ID NQ:49) 

TCATCCAGGTTGTAGATGCCA (SEQ ID NO:50) 
25 AGGCTCAACAAGGAAAAATGC (SEQ ID NO:51) 

GCTAGACAGTCAAGGAGGGACG (SEQ ID NO:52) 

AAAGGGTGGGTGTGGGAGACATTGG (SEQ ID NO:53) 
AAACCAACCTAGGGACCCCAAA (SEQ ID NO:54) 

CAGTGTCCAAAGA6GACC (SEQ ID NO:55) 
30 CTACCCCTTTAGCGACC (SEQ ID NO:56) 

TCCTGCCCCCAGAGCGTCACC (SEQ ID NO:57) 
GTACGGTCGACATAATTTTGGAGGA (SEQ ID NO:58) 
CGACGAACTTCTCTGAAGCGAA (SEQ ID NO:59) 
AGCGACACGGGCATCTGG (SEQ ID NO:60) 
35 ATGAGCGTCCACCTCCTGAACC (SEQ ID NO:61) 
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AGGCAGCAGCATCGTCATCCCC (SEQ !D NO:62) 
TGCATAGCTAGGTCCTGC (SEQ ID NO:63) 

AACTGACRAAACTAGCTCTATGGGGTGGTGCCGCA (SEQ ID NO:64) 
CTGGCTCTGAAACTTACTAGCCC (SEQ ID NO:65) 
GCTGGACTGTCACCGCATG (SEQ ID NO:66) 
GGAGCAGGGTTGGCGTG (SEQ ID NO:67) 
TGCCCTCCCAGAGGTAAGGCCT (SEQ ID NO:68) 
CCCTCCCGGAGGTAAGGCCTC (SEQ ID NO:69) 
GATCAAAGAGACAGACGAGC (SEQ ID NO:70) 
GAAGCCCAGGAAATGC (SEQ ID NO:71) 
GGACGCCCACCTGGCCAACC (SEQ ID NO:72) 
CGTGCTGCCCAACGAAGTG (SEQ ID NO:73) 

The primers and probes may be manufactured using any convenient method of 
synthesis. Examples of such methods may be found In standard textbooks, for ex- 
ample "Protocols for Oligonucleotides and Analogues; Synthesis and Properties/' 
Methods in Molecular Biology Series; Volume 20; Ed. Sudhir Agrawal, Humana 
ISBN: 0-89803-247-7; 1993; Lsup.st Edition. If required the primers) and probe(s) 
may be labelled to facilitate detection. 



According to another aspect of the present invention, there is provided a diagnostic 
kit comprising at least one diagnostic primer of the invention and/or at least one al- 
lele-specific oligonucleotide primer of the invention. 

The diagnostic kits may comprise appropriate packaging and instructions for use in 
the methods of the Invention. Such kits may further comprise appropriate buffer(s) 
and polymerase(s) such as thermostable polymerases, for example taq polymerase. 

Preferred kits can comprise means for amplifying the relevant sequence such as 
primers, polymerase, deoxynucteotides, buffer, metal ions; and/or means tor dis- 
criminating the polymorphism, such as one or a set of probes hybridising to the poly- 
morphic site, a sequence reaction covering the polymorphic site, an enzyme or an 
antibody; and/or a secondary amplification system, such as enzyme-conjugated 
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antibodies, or fluorescent antibodies. The kit-of-parts preferably also comprises a 
detection system, such as a fluorometer, a film, an enzyme reagent or another 
highly sensitive detection device. 

The methods described herein may be performed, for example, by utilizing pre- 
packaged diagnostic kits. The invention therefore also encompasses kits for detect- 
ing the presence of a polypeptide or nucleic acid of the invention in a biological 
sample (I.e.. a test sample). Such kits can be used, e.g., to determine if a subject is 
suffering from or is at increased risk of developing a disorder associated with a dis- 
order-causing allele, or aberrant expression or activity of a polypeptide of the inven- 
tion. For example, the kit can comprise a labeled compound or agent capable of 
detecting the polypeptide or mRNA or DNA or RAI gene sequences, e.g., encoding 
the polypeptide in a biological sample. The kit can further comprise a means for de- 
termining the amount of the polypeptide or mRNA in the sample (e-g.. an antibody 
which binds the polypeptide or an oligonucleotide probe which binds to DNA or 
mRNA encoding the polypeptide). Kits can also include instructions for observing 
that the tested subject is suffering from or is at risk of developing a disorder associ- 
ated with aberrant expression of the polypeptide If the amount of the polypeptide or 
mRNA encoding the polypeptide is above or below a normal level, or if the DNA 
correlates with presence of an RAI allele that causes a disorder. 

For antibody-based kits, the kit can comprise, for example: (1) a first antibody (e.g., 
attached to a solid support) which binds to a polypeptide of the invention; and, op- 
tionally, (2) a second, different antibody which binds to either the polypeptide or to 
the first antibody and is conjugated to a detectable agent 

Identification of an allele as having Implication for risk of cancer 

An allele In the s or r region can be Identified as con-elated with an increased risk of 
developing cancer on the basis of statistical analyses of the incidence of a particular 
allele in two groups of individuals with and without cancer, respectively, according to 
the ^test, which is well known in the art. Furthermore, an allele In the region can be 
identified as an allele correlated with prognosis of cancer on the basis of statistical 
analyses of the incidence of a particular allele in individuals demonstrating different 
prognostic characteristics. 
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Identification of humans having increased likelihood of responding to treat- 



15 



5 It is further contemplated that the present invention provides a method for Identifying 
a human subject as having an increased likelihood of responding positively to a 
cancer treatment comprising determining the presence in the subject of a s or r re- 
gion allele genotype correlated with an Increased likelihood of positive response to 
treatment, whereby the presence of the genotype identifies the subject as having an 
1 0 increased likelihood of responding to cancer treatment. 

The treatment mentioned herein may be any cancer treatment, such as conventional 
cancer treatment, for example X-ray, chemotherapeutics* surgical excision or com- 
binations thereof. 



Protein Products of the Gene(s). 



Gene products of the region s or r or peptide fragments thereof, can be prepared for 
• a variety of uses. For example, such gene products, or peptide fragments thereof. 
20 can be used for the generation of antibodies, in diagnostic assays. 

The gene products of the invention include, but are not limited to, human RAI gene 
products, and ASE-1 gene producis. In the following the invention is described in 
relation to RAI gene product 

25 

Gene product, sometimes referred to herein as an "protein" or "polypeptide", In- 
cludes those gene products encoded by the RAI gene sequences shown as position 
7821-21350 in SEQ ID NO: 1. Among gene product variants are gene products 
comprising amino acid residues encoded by the polymorphisms. Such gene product 
30 variants also Include a variant of the RAI gene product* 

In addition, RAI gene products may include proteins that represent functionally equi- 
valent gene products. In preferred embodiments, such functionally equivalent RAI 
gene products are naturally occurring gene products. Functionally equivalent RAI 
35 gene products also Include gene products that retain at least one of the biological 
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activities of the RAI gene products described above, and/or which are recognized by 
and bind to antibodies (polyclonal or monoclonal) directed against RAI gene prod- 
ucts. 

5 Antibodies to Gene Products 

Described herein are methods for the production of antibodies capable of specifi- 
cally recognizing one or more gene product epitopes or epitopes of conserved vari- 
ants or peptide fragments of the gene products. Furthermore, antibodies that spe- 
10 cifically recognize mutant forms are encompassed by the invention. The terms "spe- 
cifically bind" and "specifically recognize 0 refer to antibodies that bind to RAI gene 
product epitopes at a higher affinity than they bind to non-RAI (e.g.. random) epl- 



15 Such antibodies may include, but are not limited to, polyclonal antibodies, mono- 
clonal antibodies (mAbs), humanized or chimeric antibodies, single chain antibodies, 
Fab fragments, F(ab')2 fragments, fragments produced by a Fab expression library, 
antl-Idiotypic (anti-Id) antibodies, and epitope-binding fragments of any of the above, 
including the polyclonal and monodonal antibodies described below. Such ajitibod- 

20 ies may be used, for example, in the detection of a gene product in an biological 
sample and may, therefore, be utilized as part of a diagnostic or prognostic tech- 
nique whereby patients may be tested for abnormal levels of gene products, and/or 
for the presence of abnormal forms of such gene products. Such antibodies may 
also be utilized in conjunction with, for example, compound screening schemes, as 

25 described, below, for the evaluation of the effect of test compounds on gene product 
levels and/or activity. 

For the production of antibodies .against a gene product, various host animals may 
be immunized by injection with a RAI gene product, or a portion thereof. Such host 
30 animals may include, but are not limited to rabbits, mice, and rats, to name but a 
few. Various adjuvants may be used to increase the immunological response, de- 
pending on the host species, including but not limited to Freund's (complete and 
incomplete), mineral gels such as aluminum hydroxide, surface active substances 
such as lysolecithin, pluronic polyols, polyanlons, peptides, oil emulsions, keyhole 
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limpet hemoeyanin, dinitrophenol, and potentially useful human adjuvants such as 
BC6 (bacille Calmette-Guerln) and Corynebacterium parvum. 

Polyclonal antibodies are heterogeneous populations of antibody molecules derived 
5 from the sera of animals Immunized with an antigen, such as a gene product, or an 
antigenic functional derivative thereof. For the production of polyclonal antibodies, 
host animals such as those described above, may be Immunized by injection with 
gene product supplemented with adjuvants as also described above. 

10 Monoclonal antibodies, which are homogeneous populations of antibodies to a par- 
ticular antigen, may be obtained by any technique that provides for the production of 
antibody molecules by continuous cell lines in culture. These include, but are not 
limited to, the hybrfdoma technique of Kohier and Milstein, (1975. Nature 256:495- 
497; and U.S. Pat. No. 4,376,110), the human B-cell hybrfdoma technique (Kosbor 

15 et a!., 19B3, Immunology Today 4:72; Cole et al., 1983, Proc, NatL Acad. Sci. U.S A 
80:2026-2030), and the EBV-hybridoma technique (Cole et al., 1985, Monoclonal 
Antibodies And Cancer Therapy, Alan R. Liss, Ina, pp. 77-96). Such antibodies may 
be of any immunoglobulin class Including IgG, IgM, IgE, IgA, IgD and any subclass 
thereof. The hybrfdoma producing the mAb of this invention may be cultivated in 

20 vitro or in vivo. Production of high titers of mAb$ in vivo makes this the presently 
preferred method of production. 

In addition, techniques developed for the production of "chimeric antibodies" (Morri- 
son, et al., 1984, Proc. NatL Acad. Sci., 81:6851-6855; Neuberger, et al., 1984, Na- 

25 ture 312:604-608; Takeda, et al., 1985, Nature, 314:452-454) by splicing the genes 
from a mouse antibody molecule of appropriate antigen specificity together with 
genes from a human antibody molecule of appropriate biological activity can be 
used. A chimeric antibody is a molecule In which different portions are derived from 
different animal species, such as those having a variable region derived from a 

30 murine mAb and a human immunoglobulin constant region. (See, e.g., Cablily et al., 
U.S. Pat. No. 4,816,567; and Boss etal., U.S. Pat No. 4,816397, which are incorpo- 
rated herein by reference in their entirety.) 

In addition, techniques have been developed for the production of humanized anti- 
35 bodies. (See, e.g., Queen, U.S. Pat. No. 5,585,089, which is incorporated herein by 
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reference In Its entirety.) An Immunoglobulin light or heavy chain variable region 
consists of a •framework" region interrupted by three hypervariable regions, referred 
to as complementarity determining regions (CDRs), The extent of the framework 
region end CDRs have been precisely defined (see, "Sequences of Proteins of Im- 
munological Interest?, Kabat, E. et al., U.S. Department of Health and Human Serv- 
ices (1983) ). Briefly, humanized antibodies are antibody molecules from non-human 
species having one or more CDRs from the non-human species and a framework 
region from a human immunoglobulin molecule. 

Alternatively, techniques described for the production of single chain antibodies 
(U.S. Pat. No, 4.946,778; Bird. 1988. Science 242:423-426; Huston, et al.. 1988, 
Proa Natl. Acad. Sd. U.S.A. 85:5879-5883; and Ward, et al., 1989, Nature 334:544- 
546) can be adapted to produce single chain antibodies against gene products. Sin- 
gle chain antibodies are formed by linking the heavy and light chain fragments of the 
Fv region via an amino acid, bridge, resulting in a single chain polypeptide. 

Antibody fragments that recognize specific epitopes may be generated by known 
techniques. For example, such fragments Include but are not limited to: the F(ab') 2 
fragments, which can be produced by pepsin digestion of the antibody molecule and 
the Fab fragments, which can be generated by reducing the disulfide bridges of the 
F(ab') 2 fragments. Alternatively, Fab expression libraries may be constructed (Huse, 
et al., 1989, Science 246:1275-1281) to allow rapid and easy identification of mono- 
clonal Fab fragments with the desired specificity. 

Immunoassays for gene products, conserved variants, or peptide fragments thereof 
wilt typically comprise incubating a sample, such as a biological fluid, a tissue ex- 
tract, freshly harvested cells, or lysates of cells in the presence of a detedably la- 
beled antibody capable of identifying gene product, conserved variants or peptide 
fragments thereof, and detecting the bound antibody by any of a number of tech- 
niques well-known in the art. 

The biological sample may be brought in contact with and immobilized onto a soiid 
phase support or carrier, such as nitrocellulose, that is capable of immobilizing cells, 
cell particles or soluble proteins. The support may then be washed with suitable 
buffers followed by treatment with the detectably labeled gene product specific arrti- 
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body. The solid phase support may then be washed with the buffer a second time to 
remove unbound antibody. The amount of bound label on the solid support may 
then be detected by conventional means. 

By "solid phase support or carrier" is intended any support capable of binding an 
antigen or an antibody. Well-known supports or carriers include glass, polystyrene, 
polypropylene, polyethylene, dextran, nylon, amylases, natural and modified cellulo- 
ses, polyacrylamldes, gabbros, and magnetite. The nature of the carrier can be ei- 
ther soluble to some extent or insoluble for the purposes of the present Invention. 
The support material may have virtually any possible structural configuration so long 
as the coupled molecule is capable of binding to an antigen or antibody. Thus, the 
support configuration may be spherical, as in a bead, or cylindrical, as in the inside 
surface of a test tube, or the external surface of a rod. Alternatively, the surface may 
be flat such as a sheet test strip, etc. Preferred supports Include polystyrene beads. 
Those skilled in the art will know many other suitable carriers for binding antibody or 
antigen, or will be able to ascertain the same by use of routine experimentation. 

One of the ways In which the RAI gene product-specific antibody can be detectably 
labeled is by linking the same to an enzyme, malate dehydrogenase, staphylococcal 
nuclease, defta-5-steroid. isomerase, yeast alcohol dehydrogenase, a-glycera- 
phosphate. dehydrogenase, triose phosphate isomerase, horseradish peroxidase, 
alkaline phosphatase, asparaginase, glucose oxidase, p-galactosidase, ribonucle- 
ase, urease, catalase. glucose-6-phosphate dehydrogenase, giuccamylase and 
acetylcholinesterase. The detection can be accomplished by colorimetric methods 
that employ a chromogenic substrate .for the enzyme. Detection may also be ac- 
complished by visual comparison of the extent of enzymatic reaction of a substrate 
in comparison with similarly prepared standards. 

Detection may also be accomplished using any of a variety of other immunoassays. 
For example, by radioactively labeling the antibodies or antibody fragment*, by la- 
beling the antibody with a fluorescent compound. Among the' most commonly used 
fluorescent labeling compounds are fluorescein isothlocyanate, rhodamine, phyco- 
erythrin, phycocyanin, allophycocyanin, o-phthaldehyde and fluorescamine. 
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The antibody can also be detectably labeled using fluorescence emitting metals 
such as 152 Eu, or others of the lanthanide series or by coupling it to a chemilumines- 
cent compound. 

5 Diseases 

Described herein ana various applications of gene sequences, gene products, in- 
cluding peptide fragments and fusion proteins thereof, and of antibodies directed 
against gene products and peptide fragments thereof. Such applications include, for 
10 example, prognostic and diagnostic evaluation of cancer and the identification of 
subjects with a predisposition to suph disorders, as described above. 

The method according to the Invention may be used In relation to any cancer form, 
such as, but not limited to, skin carcinoma including malignant melanoma, breast 
15 cancer, lung cancer, colon cancer and other cancers in the gastro-intestinal tract, 
prostate cancer, lymphoma, leukemia, pancreas cancer, head and neck cancer, 
ovary cancer and other gynecological cancers. In particular the method is relevant 
for skin cancer, lung-cancer, colon cancer and breast cancer,- such as skin cancer 
and breast cancer, preferably wherein the skin cancer is basal cell carcinoma. 



20 



In particular, the method Is relevant for early age cancer, such as early age breast 
cancer 



Gene nucleic add sequences, described above, can be utilized for transferring re- 
25 combinant nucleic acid sequences to cells and expressing said sequences In recipi- 
ent cells. Such techniques can be used, for example, in marking cells or for the 
treatment of cancer. Such treatment can be in the form of gene replacement ther- 
apy. Specifically, one or more copies of a normal RAI gene or a portion of the RAI 
gene that directs the production of an RAI gene product exhibiting normal RAI gene 
30 function, may be inserted into the appropriate cells within a patient, using vectors 
that include, but are not limited to, adenovirus, adeno-associated virus, and retrovi- 
rus vectors, in addition to other particles that introduce DNA into cells, such as lipo- 
somes. 
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Examples 

The examples relate to prediction from sequence polymorphisms in the region s or r 
to cancer. Blood was collected before, (exampe 6) or after (examples 1 through 5) 
S the persons acquired cancer. However, the sampling time is considered immaterial, 
as DNA in a polyclonal blood sample is not expected to change over time. 

The particular sequence polymorphisms analysed in these examples are listed in 
Table 6. together with their sources of information and their definition as sequences. 

10 

Table 6. The markers used, their sources of information, and their currently esti- 
mated positions on chromosome 19, as well as their position in figure 2. 



Name 


Source of 


Position in 


GenBank Acces- 


Chromosome 


Position 




identification 


sequence 


sion 


Position 


in Figure 








Number' of se- 


(Mbases) 


2 








quence 






XRCC1 e10 


Ref. 1 


28152 


L34079 


59.420 


1 


CKM e$ 


rs#8188 


20076 


AC005781 


61.361 


2 


XPDe23 


Ref. 1 


35931 


L47234 


61.479 


3 


XPDelO 


Ref. 1 


23591 


L47234 


61.491 


4 


XPDeS 


Ref. 1 


22541 


L47234 


62.4923 


5 


XPDI4 


rs#1618S36 


19244 


L47234 


61.4924 


6 


RAIeS 


rs#6966 


8786 


L47234 


61.506 


7 


RAII1 


rs#1 970764 


875 


L47234 


61.514 


8 


ASE1 e1 


rs#Q6759t 


232125 


NTJ)11242 


61.534 


9 


ERCC1 e4 


Ref.1 


19007 


M63796 


61.547 


10 


FOSB64 


rs#1 049698 


34621 


M89651 


61.601 


11 


SLC1A5 e8 


rs#1 060043 


60620 


AC008622 


62.946 


12 


GLTSCR1 e1 


rs#1 035938 


20775 


AC010519 


63.986 


13 


LIG1 e6 


rs#20580 


111 


L27710 


65.460 . 


14 



re numbers ware derived from the NCBI's database dbSNP. 

Ref 1: Shen, M.R., Jones, I.M., and Mohrenweiser. H. (1998) Nonoonservative 



1 5 amino acid substitution variants exist at polymorphic frequency in DNA repair genes 
in healthy humans. Cancer Res., 58: 604-8, 1998. 
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MATERIALS AND METHODS 

Study groups. The groups of Caucasian Americans with and without BCC have 
been described previously (Athas et al. Cancer Res. 51:5786-5793, 1991; Wei et al, 
5 Proc. Natl. Acad. Scl USA, 90: 1614-6, 1994). Briefly, the study was a clinic based 
case control study at the Johns Hopkins Hospital, which serves multiple participating 
dermatologists In Maiyland. Cases were hfstc*-pathologlcally confirmed primary 
BCCs and were diagnosed between 1987-1990. The controls were patients from the 
same physician practices and had a diagnosis of mild skin disorders. All participants 

10 were Caucasians living near Baltimore and were between 20 and 60 years of age. 
The controls were frequency matched to the cases by age and sex. Cases and con- 
trols with any other forms of cancer were excluded. In the questionnaire, the study 
subjects were asked if they had any blood relatives with skin cancer, and were 
asked to specify the type of cancer. Study subjects with relatives with basal cell car- 

15 cinoma and squamous cell carcinoma and 'skin cancer* were included in the group 
of subjects with a family of skin cancer Subjects with relatives with melanoma were 
not included. At the clinic visit the subjects gave informed consent, were examined 
by dermatologists, completed a structured questionnaire and provided blood. DNAs 
from available frozen lymphocytes were purified using Puregene (Qentra Systems) 

20 and were genotyped. Initially, 71 cases and 118 controls were included in.this study. 
However, the number of persons varied between analyses, as the supply of DNAs 
gradually was depteted. In case of the SNP RAI 11 only 133 persons could be geno- 
typed reliably. 

25 The groups of 20 psoriatic Danes with and 20 psoriatic Danes without BCC have 
been described previously (Dybdahl et al, Cancer Epidemiol Biomarkers Prev., 
8:77-81, 1999). Briefly, BCC subjects were identified from a population-based cohort 
of persons treated by Danish dermatologists in the year 1995. and fulfilled the fol- 
lowing criteria (a) age in 1995 < 50 years; and (b) clinically verified diagnosis of pso- 

30 riasis. The diagnosis of BCC was clinically and histologically confirmed. The controls 
consisting of psoriasis cases without BCC was selected from among patients treated 
In the year 1992-1995 for psoriasis by dermatologists who participated in the na- 
tional cohort study 1995. The controls were matched by age and sex. The patients 
with psoriasis and BCC differed from the national cohort of BCC In that the average 

35 of first BCC was 38 year against 56 year in the cohort A number of cases had had 
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* multiple BCCs. There was a tendency that cases had been treated for a longer time 
than the controls, and also that the treatments were more Intense. This was to be 
expected as treatment of psoriasis involves a number of carcinogenic treatment mo- 
dalities. DNAs from available frozen lymphocytes were purified using Puregene 
5 (Gentra Systems) and were genotyped. 

Primers and probes. Table 7 Includes the polymorphisms typed on Lightcycler™, the 
primers used for the PCR reaction and the probes used for detection and typing of 
the PCR products. Table 8 lists the polymorphisms typed by conventional PCR- 

10 RFLP, end the primers and restriction enzymes used. Table 9 lists the polymor- 
phisms typed by SNaPshot technology and the primers used. Table 10 lists the poly- 
morphisms analyzed on a Taqman, and the primers and probes used. Hobolth DIMA, 
Hillered, Denmark or DMA Technology, Aarhus, Denmark, synthesized the primers 
in tables 7. 8, and 9. TIB Mol-Bfel, Berlin Germany synthesized the Lightcycter 

15 probes. TAG-Copenhagen ApS (Tagc.com, Copenhagen, Denmark) synthesized the 
primers, and Applied Biosystem synthesized the fluorescent Taqman probes in table 
10. 

Table 7. Design of primers and fluorogenic probes for LrghtCycler 

ASE1 e1 : : : 

Forward primer: 5-GGTTTTCTGCTCTGCACACG 
Reverse primen S'-CCTTTCTCCTTCCACCAACG 
Anchor probe: 5 ! -TCTGCAACCTGGTGCGAGCAGC-Flu6re$c«in 
Sensor probe: 5'-LCRed640-CGGGCTACAGGGTTACCTGAG-p 

CKMeB 

Forward primer: ff-TTGAAACTGGAACTCTGAGAAGG 

Reverse primer: S'-TGGTGGATGGTGTGAAGCA 

Anchor probe: 5'-LC Red 640- 
CCTTTCTCCAACTTCTTCTCCATTTCCACOp 

Sensor probe: S'-GGGGATCATGTCGTCAATGGACT-FIuorescein 
ERCC1 e4 

Forward primen 5-AGGACCACAGGACACGCAGA-3' 
Reverse primer S-CATAGAACAGTCCAGAACAC-S' 
Anchor probe: 5 -LCRed640-TGGCGACGTAATTCCCGACTATGTGCTG p- 

3' 



41 



25/02 2003 15:28 FAX 33320^V HO I BERG A/S -» ©042 

P687DKD2 



Sensor probe: 5-CGCAACGTGCCCTGGGAAT-Fluorescein 
FOSBe* 

Forward primer. 5-AGGCTCAACAAGGAAAAATGC 
Reverse primer: 5 '-GCTAGACAGTCAAGGAGGGACG 
Anchor probe: 5 v -LCRed 640-AAAGGGTGGGTGTGGGAGACATTGG-p 
Sensor probe: 5'~AAACCAACCTAGGCACCCCAAA-Fluorescein 
GLTSCR1 e1 

Forward primer S'-CGACGAACJTTCTCTGAAGCGAA 
Reverse primer. 5 V -AGCGACACGGGCATCTGG 
Anchor probe: 5-ATGAGCGTCCACCTCCTGAACC-fluorescein 
Sensor probe: 5'-LCRed 640-AGGCAGCAGCATCGTCATCGCC-p 
UG14B 

Forward primer: 5-ATGCCCTGTAGGTTCAATGG 
Reverse primer: 5-TGGAGGTCTTTAGGGGCTTG 
Anchor probe: S'-GGCTGGTCCCCGTCTTCTCCTTCOFIuoresceln 
Sensor probe: 5'-LC Red 640-TCTCTGTTGCCACTTGAGCCTC-p 
flWJH 

Forward primer: 5-TGGCTAACACGGTGAAACC 
Reverse primer 5'-GGAATCCAAAGATTCTATGATGG 
Anchor probe: 5*-GGGAGGCGGAGCTTGCAGTGA-Fluorescein 
Sensor probe: 5-LCRed 640-CTGAGATCGCACCACTGCAC-p 
SLClABeB 

Forward primer; ff-CAGTGTCCAAAGAGCACC 
Reverse primer: S'-CTACCCCTTTAGCGACC 
Anchor probe: S'-LCRed 640-TCCTGCCCCCAGAGCGTCACOp 
Sensor probe: S^TACGGTCCAGATAATTTTGGAGGA-Fluoresceln 
XPD e10 

Forward primer S'-GATCAAAGAGACAGACGAGC 
Reverse primer S'-GAAGCCCAGGAAATGC 
Anchor probe: 5'-GGACGCCCACCTGGCCAACC-Fluorescein 
Sensor probe: 5 , -LCRed640-CGTGCTGCCCAACGAAGTG-p 



Table 8. Primers and restriction enzymes used for typing of SNPs using PCR- 
RFLP 
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Gene exon 


• 

Primers Enzyme 


Digested 






Fragments 


XRCC1 exonIO 


TrOTGCTTTCTCTGTGTCCA Mspl 


240,375bp(A) 




TATCAGAAAAGGCTGGAGGA 


615bp(G) 


ERCC1 exorvl 


AGGACCACAGGACACGCAGA BsrDl 


157, 36Bbp(A); 




CATAGAACAQTCCAGAACAC 


525bp(G) 


XPDexonS 1.set 


CACACCTGGCTCATTTTTGTAT Tffl 






TCATCCAGGTTGT AG ATGCC A 




2-Set 


TGGAGTGCTATGGCACGATCTCT 7751 


66. 114, 482 bp (A); 




CCATGGGCATCAAATTCCTGGGA 


56, 596 bp (C) 


XPDexon23 l.set 


GTCCTGCCCTCAGCAAAGAGAA 






TTCTCCTGCGATTAAAGGCTGT 






ATCCTGTCCCTACTGGCGATTC Pst\ 


66,100, 156(C); 




TGTGAACGTGACAGTGAGAAAT 


100.224(A) 



Table 9. Design of primers and SNaPshot primers for SNaPshot typing on 
sequenator. 

XRCC1 exon7 \ \ 

Forward primer: 5-GTCCCATAGATAGGAGTGAAAG 

Reverse primer. 5'-CCCTAGGACACAGGAGCACA 

SNaPshot primer: 5-TGCATAGCTAGGTCCTGC 
XRCC1 exon17 

Forward primer 5-GCCAAGCAGAAGAGACAAA 

Reverse primer. 5*-GAGTGGCTGGGGAGTAGGA 

SNaPshot primer: 

ff-AACTGACRAAACTAGCTCTATGGGGTGGTGCCGCA 
RAI exon6 

Forward primer S'-CCTACCACCATCATCACATCC 
Reverse primer. S'-GCCTTGCCAAAAATCATAACC 
SNaPshot primer 6'-CCTCTCCCCAATTAAGTGCCTTCACACAGC 
XPPintrorv* 

Forward primer 5-CGCAAAAACTTGTGTATTCACC 
Reverse primer. 5-CCCAI 1 1 1 1 ATCATCAGCAACC 
SNaPshot primer 5-CTGGCTCTGAAACTTACTAGCCC 
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Table 10. Design of primers and probes for Taqman. 

XRCC1 exon10 

Forward primer 5-GCT GGA CTG TCA CCG CAT G 

Reverse Primer 5'-GGA GCA GGG TTG GCG TG 

Probe (A): 5'Fam- TGC OCT CCC AGA GGT AAG GCC T -Tamra 

Probe (G): 5 Vic - CCC TCC CGG AGG TAA GGC CTC -Tamra 

Determination of polymorphisms by Ughtcycler. Genotypes of the American persons 
for polymorphisms in ASE-1e1, CKMe8. ERCC1e4, FOSBe4. GLTSCR1e1. LIG1e6 f 
RAM. SLC1A5e8 and XPDe10 and of the Danish persons for polymorphisms ASE- 
5 1e1 f CKMe8, FOSBe4. LIG1e6 and SLClA5e8 were detected using UghtCycter™ 
(Roche Molecular Biochemical*/ Mannheim, Germany). PCR was performed by 
rapid-cycling in a reaction volume of 20 jil with 0,5 \Ml of each primer, 0.045 p.M of 
anchor and sensor probe, 3.5 mM MgCI 2 . approximately 7 - 25 ng genomic DNA, 
and 2 ill UghtCycter DNA Master Hybridization probe buffer (Roche Molecular Bio- 

10 chemicals, Cat No 2158 825). This buffer contains Taq DNA polymerase, dNTP 
mix, and 10 mM MgCI 2 . In some cases the reaction mixture also contained : 5% 
DMSO. The temperature cycling consisted of denaturation at 95°C for 2 sec, fol- 
lowed by 46 cycles consisting of 2 sec at 95°C, 10 sec at 57°C. and 30 sec at 72°C. 
The last annealing period at 72 6 C was extended to 120 sec. The melting profile was 

1 5 determined by a temperature ramp (torn 50°C to 95°C with a rate of 0,1 degree/sec. 
For RAH2 the melting profile was run 3 tJmes s and the last curve was used. 

PCR-RFLP analyses. Genotypes of the American persons for polymorphisms in 
XPDe6 and XPDe23 and of Danish psoriatics for polymorphisms In XRCC1e10, 
20 ERCC1e4, XPDe6 ( and XPDe23 were detected using PCR-RFLP technique (Shen 
et al see above; Dybdahl et al, see above; Vogel et al, Cancer Epidemiol. Blomark- 
ers Prev., 8:77-81 (2001)). The reactions were performed as reported (Shen et al. 
see above; Dybdahl et al, see above; Vogel et al, Cancer Epidemiol. Biomarkers 
Prev., 8:77-81 (2001)). 



25 



Determination of polymorphisms by SNaPshot technique on sequenator. The poly- 
morphisms in RAIe6, XPDi4. XRCC1e7, and XRCC1e17 in the American persons 
were typed simultaneously on an ABI Prism 310 sequenator (Applied Blosystems, 
Foster City, CA, USA) using SNaPshot technique (Lindblad-Toh et al. Nature Ge- 
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netics, 24; 381-6, 2000.)- The PCR reaction consisted of 1 pi of purified genomic 
DNA, 1 pmole of each primer (DMA Technology, Aarhus Denmark), 12.5 nmole of 
each dNTP (Bioline, London, UK). 100 nmole MgCfe (Bioline), 0.15 pi BIOTAQ™ 
DNA Polymerase (Bioline) In a total volume of 20 pi of water. The program con- 

5 slsted of 4 min at 96°C. followed by 25 cycles of 96°C for 30 sec, 60°C for 30 sec, 
and 72°C ,for 60 sec. The last cycle was followed by 72°C for B min. The primers 
and dNTPs were removed In reactions containing 2 U Shrimp Alkaline Phosphatase 
(SAP) (Roche), 2 U Exonuclease I (Biolabs, Denmark), and 9 pi PCR reaction In a 
total volume of 14 pi water. The reactions were incubated at 37°C for 60 min and 

10 72°C for 15 min. The SNaPshot reactions contained 1 pi of SNaPshot Ready Reac- 
tion Mix (Applied Biosystems), 0,5 pi of each SNaPshot primers (XRCCe7-ss1: 
4pmol/pl. XPDi5-cp1; 0,5pmol/pl. RAIe7-cp1; IpmoVpl; XRCCe17-ss1; 2pmotfpl), 
2 pi of the purified PCR product, and 1.5 pi of buffer (200 mM Tris-HCI, 5 mM MgCI 2f 
pH 9.0). The reactions were cycled 25 times: 96X for 10s, 50°C for 5s, and 60 q C 

15 for 30s. The primers and dNTPs were removed in a reaction containing 1 U SAP, 
0.8 pi 10XSAP buffer, and 5 pl SNaPshot reaction in a total volume of 8 pi of water. 
Two pi purified product was added to 10 pl of concentrated deionteed formamide 
(Amresco, Ohio, USA), incubated for 5 min at 95°C, and analyzed on the sequena- 
tor. The two markers in XRCC1, in axon 7 and exan 17, could not be reliably scored 

20 and thus were excluded from further consideration. 

Determination of polymorphisms by real-time PCR using Teqman probes. Ttie poly- 
morphism In XRCC1e10 in the American persons was analysed using the ABI Prism 
7700 sequence detection system (Applied Biosystems, Foster City. Ca, USA). PCR 

25 Primers and Taqman probes were designed using Primer Express v 1.0 (Applied 
Biosystems). The reactions were performed in MicroAmp optical tubes sealed with 
MicroAmp optical caps (Applied Biosystems) containing a 10 pl reaction volume: 1x 
Taqman buffer A, 2.5mM MgCl 2< 200 pM each of dATP dCTP. dGTP, 400pM dUTP, 
800nM each primer, 200nm each probe, 0,01 U/pL AmpErase UNG, 0,025 U/pL 

30 AmpliTaq Gold Polymerase. Thermal cycler conditions were: Tubes were incubated 
at 50°C for 2 mfn followed 10 min at 95°C. The incubation was succeeded by 45 
cycles of 95°C for 1 5 sec and 64°C for 1 min. 
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Example 1 

DMA from humans from the American cohort of patients with basal cell carcinoma 
and controls, described In Materials and Methods, was typed with respect to a num- 
ber of sequence polymorphisms located In and around the claimed region r. The 
resulting statistical p~values for association of occurrence of the Individual sequence 
polymorphisms with the status of patients are depleted In Figure 2. Also depicted are 
the calculated odds ratios for association of sequence polymorphism and disease. 
For the calculation of the odds ratios the heterozygote genotypes were combined 
with the lesser group of homozygotes. and the ordering of the groups was chosen 
such that the odds ratio became more than or equal to 1. The results show that the 
sequence polymorphism RAM Is strongly associated with disease in this cohort (p = 

0. 004). Bonferroni correction for the number of tests made Indicates that a result 
less than 0.007 must be considered significant at a level of 0.05. Thus, even after 
correction for multiplicity of testing this result is significant. 

The numbers next to the points in the curves are merely a help to identify the single 
sequence polymorphisms: 

1. Xr1e10: 2. CKMeK 3, XPDe23; 4, XPDe10; 5, XPDe6; 6, XPDi4; 7, RAleS; 8, 
RAM; 9, ASE-1e3; 10, ERCC1e4; 11, FOSBei; 12, SLC1A5e8; 13, GLTSCR1e1; 
14, LIG1e6. 
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Example 2 

Those persons In example 1, who got basal cell carcinoma before the age of 50 
S . years, were selected, and the resutts from analysis of RAM were compared the 
status of the patients. There was a strong relationship between the occurrence of 
the Individual genotypes of the sequence polymorphism and the status of the pa- 
tients (Table 7; Odds ratio = 12.3; p& 2 ) = 0.00014). 

10 Table 7. Occurrences of genotype for the sequence polymorphism RAI 11 In Ameri- 
can with Basal cell carcinoma occurring before 50 years of age and in controls. 



RAM genotypes 


Number of cas 
age 


;es before 50 years of Number of controls 


AA 


31 


44 


AG 


2 


32 


OG 


0 


5 



15 Examples 

The data of Example 2 were combined with results of genotyping the neighbouring 
sequenoe polymorphism RAIe6. There was a very strong association between the 
combined genotypes of RAlil and RAleS and the status of the patients. Thus, al- 
20 most ail American cases occurring before the age of 50 yrs were homazygote for 
RAI 11 A RAI e6* while only approximately half of the controls were so (Table 8, 
Odds ratio ■ 12.8; p(x z ) = 0.00006). 
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Table 8. Combined occurrences of different genotypes for the sequence polymor- 
phisms RAM2 and RAIe7 in American cases occurring before 50 years of age and in 
controls. 





r RAM 


RAIe6 


AA 


AO 


GO 


BCC cases 


AA 


30 


0 


0 




AT 


0 


2 


0 




TT 


0 


0 


0 


Controls 


AA 


42 


10 


1 




AT 


2 


21 


0 




TT 


1 


0 


2 



Example 4 



10 The data of Example 2 were combined with results of genotyping the sequence 
polymorphism GLTSCR1e1 located outside the claimed region r. There was a very 
strong association between the combined genotypes of RAH1 and GLTSCR1e1 and 
the status of the patients. It was obvious to define "risk-genotypes" as having two As 
in RAM and at least one C in GLTSCR1e1. This corresponds to the assumptions 

15 that RAIi1 A is recessive, and GLTSCR1e1 c is dominant If one does so, one finds 
that 25 out of 25 cases have a •risk-genotype", while only 28 out of 62 controls have 
one (Table 9; Odds ratio > 30; p(% 2 ) = 0.000002). 
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Table 9, Combined occurrences of genotypes for the sequence polymorphisms 
RAM and GLTSCR1e1 In American cases of basal cell carcinoma occurring before 





RAM 


QLTSCR1e1 . 


AA 


AG 


GG 




BCC cases 


CC 


17 


0 


0 






CT 


8 


0 


0 






TT 


0 


0 


0 




Controls 


CC 


15 


18 


' 3 






CT 


13 


7 


0 






TT 


3 


3 


0 





Example 5 

DNA from humans from the cohort of Danish psoriatics with basal cell carcinoma 
and controls, described In Materials and Methods, was typed with respect to a num- 
10 ber of sequence polymorphisms located in and aroundthe claimed region r. The 
resulting statistical p-values for association of occurrence of the individual sequence 
polymorphisms with the status of patients are depicted in Figure 3. The results show 
that the sequence polymorphism ERCC1e4 is strongly associated with disease in 
this cohort (p-0.01). 

15 

Example 6 

Blood samples were collected from a large number of Danish citizens and frozen. 
After a number of years those women, who got breast cancer in the intervening pe- 

20 rtod, were Identified, as well as a set of matching controls. DNAs were purified from 
the blood samples of these persons and a number of polymorphisms, namely RA!i1, 
ASE-1e3 and ERCC1e4, In the region of interest were typed. The polymorphisms 
were subsequently combined such that the high-risk group was homozygous for the 
high-risk alleles of all three polymorphisms: RAIi1 M ASE-1e3 GG ERCC1e4 6Q . All 

25 other genotypes were combined into the low-risk group (Table 10; OR = 1.59; pfo 2 ) 
= 0.004). 



49 



25/02 2003 15:28 FAX 333203 



HOIBERG A/S 



@050 



P687 DK02 



47 



Table 10. Occurrence of a combined "high-risk" genotype RAH1 AA ASE- 
1e3 QQ ERCC1e4 OG as opposed to all other combinations of genotypes for the se- 
quence polymorphisms RAM, ASE-e3 and ERCC1e4 in Danish cases of breast 
cancer and controls. 





High -risk 


Low-risk 


Cased 
Controls 


120 
277 


85 
312 



10 



me DNAs in these examples were purified from available frozen lymphocytes using 
Puregene (Gentra Systems). A variety of other ways of purifying DNA is available to 
the expert and would also be expected to lead to the wanted results. 



15 



Analysis of sequence polymorphisms can be performed with a variety of techniques* 
some of which have been used in the examples of this application. Most often a 
number of techniques can produce the wanted result 

Similarly, the choice .of primers and probes In a particular assay is to some extent 
free and other primers and probes might well produce similar results. 



Finally, it is to be expected that assays for other sequence polymorphisms in the 
20 region of interest may produce roughly similar results. Our particular choice of se- 
quence polymorphisms and assays used in the examples are thus not intended to 
limit our claims. Thus, at present about 30 SNPs within the region r are listed In 
NCBIs database dbSNP including rs#2Q70830. rs#2017104, rs#2017154 and 
rs#2377328, ail within or very close to RAI. Other forms of polymorphisms such as 
25 the tandem repeat polymorphisms D19S543 and D19S393 are also known to occur 
in the region and can probably serve as markers In the present invention. Moreover, 
it is very likely that the region contains a number of as yet undiscovered polymor- 
phisms. For Instance, the sequence of the 5' half of RAI and its upstream promoter 
region is currently only a draft version and new polymorphisms of potential use for 
30 this invention are likely to be uncovered as more sequence reads of this segment 
are produced. 
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Sequence of the r region of chromosome 19 

The following depicts the region r stretching from the beginning of, but not Including 
the XPD gene, to approximately the end of ERCC1, and includes the genes RAI, 
5 LOC1 62978, and ASE-1. More specifically r is bounded by and includes the follow- 
ing two sequences: AGAACCCCCQ CCCCTCCACC TCGTCTCAAA and 
TCCCTCCCCA GAGACTQCAC CAGCGCAGCC, and is defined by SEQ ID NO: 1 
herein below: 

10 AGAACCCCCGCCCCTCCACCTCGTCTCAAAAAAAAAAAAAAATC6TCTCAGTAGCGA- 
ATAGTCTAACGGAGAATG ACAGGGAAATTGGTGATCCTTTCTGGGCCCAAGAGTT A- 
GAAATGGCTTTGCAGGCCGGGCGCGGTGGCTCAAGCCTGTAATCCCAGCACTTTGG- 
GAGGCTGAGGCAGGTGGATCACCTGAGGTCGGGAGTTCAAGACCAGCCTGACCAA- 
CATGGAGAAAACCTGTCTCTACTAAAGATACAAAATTAGCCGGGCGTGCTGGCAAATG- 

1 5 CTTGTAATCCCAGCTACTCGGGAGGCTGAAGCAGGAG AATTGCTTGAACCTGGGAGG- 
CAGAGGTTGCAGTGAGCAGAGATGGCGCCGTCGCACTGTAGCCTGGGCAACAAAAG- 
CGAAACTCCATTTCAAATATTAATAATAATAACTAATAAATAAAACATAAATGCTAGCTT- 
TTGTTTGTTTCTTCMCAAATAGCTATGTGGCATCTACCATGTGTCTGATCCTGTGCT 4 ' 
GGCCCCTGGGAACAGAAAGGTGACCATGACAGCCTCAGCACCTGCCCTCAAAGAACAr 

20 QATTTTTTTCCTTGAGACAGGGTCTTTCTCTGTCGCCAAGGCTGGAGTGCAGTGGCA- 
CAGTCACAGCTCACTGCAGCCTOCACCTCTTGGGCTCAAGCGATCCTCC CACCTCAG- 
CTTCCAGAGTAGCTGGGAGCACAGGTGTGCACCACCAAGCCCAGCTAAGTTTTATTTT- 
TTAAAT7TTTTTAGAGACGAGGTCTCACCACGTTGCCCAGGCTGGTTAAACTCGCAG- 
. GrrrCAAGTGATCCTCTCCCCTCAGCCTTTCAAATTGTTGGGATTACAGGGGTGAGG- • 

25 CACCAGGCCTGGCCTCAAAGAACAGATATTAAATATACAAATGAATATATGATTACAGC- 
CTGGAGTGGTGGCTCGTGCCTGTGGTTCCAACACTTTGGAAGGCCAAGGCGAGTA- 
CATTGCTTGAGCTCAGGAGCTAGAGACCAGCCTGGGCAACATGGTGAAAACCCGTC- 
TCTACAAAAAATGCAAAAATTAGCTGGGCGTGGTGGCGTGCACCTGTAGTCCCAGAr 
TACTCAGGAGGCTGAGGTGGGAGAATCACCTGGGCCTGGGAGGCAGAGGTTGCAAT- 

30 GGGCAGTGATTGTGCCACTGCACTCCAGCCTGGGCAACAGGAGTGAAAACCTATCT- 
CAAATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGCGCACGTGTATAATCACAAGTA- 
CAAAAGTGCTGTGAAGGAAAACTTCAAGTCACCATAAAGATTGATTATGGGCTGGGTG- 
CAGTGGCTGATGCCTGTAATCCCAGCACTTTGGGAGGCCAAGGCAGATGGAT- 
CACGAGGTCAGGAGTTCAAGACCAGCCTGGTCAACATGGTGAAACCCTATCTCTAC- 

35 TAAAAAAAAAAAAAAAAAAAAAAAAGCCAGGCATAGTGGCATGCATCTGTAATCGCATC- 
TACTCGGGAGGCTAAAGCAGGAGAATTGCTTGAACCCAGGAGGCAGAAGTGAGCCAA- 
GATCACGCCACTGCACTCCAGCCTGCGTGACAGAGCAAGACTCCGTCCCAGAAAAA- 
GAAAAAAAAAAAAGACTTATTATGACAGGATGTCTACTGTCAACTGTGGGGTGTGAGT- 
GTTGGCCAAGTGATCAGAGAAGGCTTCGTGGAAGAAGCGAGGTTTGAGTAGAGCCA- 

40 GAAAATAATTAGAAGAGATCAACCAGCAAGAGGGGATGGATGAGAGAAGTGAGAAAG- 
GTGTTCCAGGGAGAGAGACCATCATACACAAAAGCTCTAGGCCAGAAGAAAGCT- 
GAGGCCTGTGAGTGCTGAAAGGAAGCCTGTGGGGGTGGAGCTCTGAGTTGAGCA- 
CAGGGAQCAGAGAAAGGGCAGCTGGAGGGGAAGGCAGGGGCAGATCGAAATCTCTT- 
TTTTAAATTAATTAATTCTT AATTTATTTATTTTTGAGACAAGGTCTCACTCTTTCGCC- 

45 CAGACTGGAGTACAGTGGCACAATCTCAGCGCACCGCAACCTCTGCCACCCAGGCT- 
* CAAGCAATTCTCTGGCCTCAGCCTCCCTAGTAGCTGGGATTACAGGTGCGCACCAC- 
TAGTGCCCAGCTAATTTTTATACTTTTAGTAGAAACGGGGTTTCACTATGTTGGCCAGG- 
CTGGCCTCAAACTCCTGACCTCAAAAGATCCACCCACTTCAGCCTCCCAAAGTGCTG- 
GGATTACAGGTGTGAGCCACCCTTCCCGGCTGTATTTFTGGAGACAGAGTCTTGCTCT- 

50 GTCCCAGCCTGGAGTATGGTGGTGTGAATTTGGCTCATTGCCACCTTGACCTCCAG- 

GGCTCAAGTGATCCTCCCACCTCAGCCTCCTGAGTAGCTGGGACTGCGGGTACACGA- 
CACCACGCCTGGTTAA I 1 1 1 ITTTAATnTTTGTAGAGACGAGGGTATCTCACTATGTT- 
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* gtccaggctggttgaactcctgagctcaagcaattctcccacctcagcctcc- 
caaagtggtgggattacagacgtgagccactgtgoccggcttaatttatttacataa- 
atttttttatgtttactt7tctatctcct acaggaagaaaatatattttgttattgacag- 
ggtctcgctatgttgcccaggctggtattgggctcaagccatcctgttccctcagcc- 
5 tcccaaagtactgggattacaagcgtgagcctctgcatccagcccagatccaaaatct- 
ttactgtcacctacagagtcctctgtaacta6cttactgct 
ccaccrttactgctctgatctcc^cctctctctcccccagctcattttgtttcagctatg- 
ctggtctccttgctgtctctaaaacataacaagcacatcccatctcagqgcctttg^ 
caccagctattttgtctgcc^ggaatgctgtttcccctgatagccatgtggctgaca- 

1 o cactcacctccctcagctctttgctcaattgtcaacttctcggcccggcatggtggct- 
cacacctgtaatcctaccactttgggaggctgaggtgggcagatcacctgagatcag- 
gagttcgagaccagc^tggccaagatggtgaaatcccgtctctactaaaaatacaaaa- 
attggcaaagcatggtagcacataccagtaatcctagctacccgggaggctgagg- 
caggagaattgctggaacccgggaggcagaggctgcagtgagccaagatcatgo 

15 cactgtactccagcctgggtgacaaagcaagactctgtctcaaaaaaaaaaaagtctc- 
cttctcaatgagggcttcctgaccaccaaattaaatctacctcctagacacacacaca-« 
cacgcacgcacgcacgcacacacacacacgcacgcacgcacacacacacacacaca- 
cacactatatcccctttccctgct^ 

catgctgaatattttacttatttattttgtttagaaagctcctggctgggcgcggg g gc- 

20 tcacgcctgtaatcccagcactttgggaggctggaacaggtggatcatgtgaggt- 
caggagttccagaccagcctgaccaacacggtgaaacctcatctctattaaaaatg- 
caaaaattagctgggtgtggtgtcgcatgcctgtaatcccaactactcagaaggct- 
gaagcaggagaatcgctfgaacctgggaggcagaggttaacgctgagccgagatcg- 
cgccattgcactccagcctgggcaacaagagtgaaactctgtctcgaaaaaaa- 

25 caaaagtcagctccatggcaggagtgatggctcacgcctataatcccagcactttgt- 
gaggccgaggcgggcggatcacttgaggtcaggagttggagaccagcctggccaa" 
catggtgaaacctcatctctactaaaaatacaaaaattagccgggcgtggtgacacat- 
gtctgtagtcccagctacttgggaggctgaggctggagaatggcttgaacctgg- 
gaggtagaggttgcagtaagccaagatcgcgccattgctctccatcctgggcaaca- 

30 gactccgtctcagaaaggaagaaagaaggaaagagagaaagagagaaagagacaga- 
gagagagagagaaagggagaaagagagaaaggatggaaggaccctgacaagcact- 
gttggataaaagtttctntctctctc i hi i i i 1 1 m ii 11 111 i ii i g agacagggto 
tcacttctgttgctccagctqaaistgcagtggtgagaacatggctcagtgcagcct- 
caacttcccaggcttaagtgatcctgcca cctc agcctcctgagtagctggqactgk 

35 taggtgtgcaccaccgtgcctagcta a1 hi i igt ai 1 1 1 1 a gtag ag acatggttccg- 
gcacgttgcccaggctggtcttgaactcctgggcttaagggatctgcccgccatggc- 
ctcccaaa gtgct gggattaccagcgtgagccagtgtacccagcctgagtataggtt- 
tctgataaattttaggatcat attgt ttggactgggtaagaatttccagaactctaat* 
gaagaaactgactggtttatattttattttattttattttatt a \ 1 1 i i g ag atggattt- 

40 tcactcttgttgcccaagctggattgcagtggcacgatcttggctgaccacaacctc- 
cgcctcccggtttcaagtgattctcctgcctcagcctccccaggagctgggatta- 
caggcacccaccaccatgctcggct a i itttttttttatttttttattn i a gtagar 
gacggggtttcaccatg7tggccaggctggtctcgaactcctgacctcaggtgato- 
cacctgccttggcctcccaaagcgctgggattacaggcatgagccactgtgcaaggc- 

45 ctaggctggtttataaaattgctaaaccaagcagaacatgaattaaataccaaggaa- 
atactctcctagattgtcatgttacatcagccaatactaaaattgtcaagatacacaat. 
ttgaatgaactccatggtccaagtcgaattatctatgatattacccatctaataaacag- 
cactatgtcccttaatgggagaaaaagttggagaatttaagagaatatcaatccaat- 
gttggttgggtgcagtgaatcatgtctatattcccagcactttgggaggccaagg- 

50 caggaggatcacttgagcccaggaattcaaggccagcctcggcaacacggtgagatc- 
ctgtctctacggaaaattaaaaaaaaaaaaagagagagattagtgggatgtggtgcc- 
tatagtcccagctacttgggaggctgaggcgggaggatcatttaagcctgggacgtt- 
gaggttgcagtgaaccatgagtgagactcatctcaaaaaaaaaaaaaaaatggcgat- 
gactagaggaaaaaaaaactaaagtggggt7tgcgggtagtgggagggcccttcctg- 

65 ctaggttgcactatgatctccagggaggctccacgqgagaatcatttccttgt c i iii* 
tcagtttctagagccaaattctttgcataccttgcattccttggctcggaaccccttcc- 
ctaaccttcaaagctggcagctagcctctggctcaagtgtcacatggcctgtctct- 
gtcttcctatccaatcttcctcttataagaacattggagccaggcatggtggctgacg- 
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CCTGTAATCCCAGCACTTTGG6AGACCGAGGCAGQCGGATCACAAGGTCAGGAGT- 
TCGAGACCAGCCTGGCCAACACAGTGAAACCCCGTCTCTACTAAAAAAATA- 
CAAAAAAGTAGCCGGGCATGGTGGCAGGTGCCTGTAATCCCAGCTACTTGAGAGGCT- 
GAGGCAGGAGAATCGCTTGAACCTGGGAGGCAGAGCTTGCAGTGAGCCGAGATAGT- 
5 GCCAATGCAGTCCGGCGTGGGCGAAACAGCGAGACTCCGTCGCAAAAAAAAAAAA- 
ATAATAATAAATAATAAATAAAAATAAAAATAAAATAAAAAAATAAAAATAATAAAATAA- 
ATVW\AATTATTTTGAGACAAAGTCTATTGTGTGGCAGAGGCTGGAATGCAGTGGCGT- 
GATCACAGCTTACTGCAGCTTCTACCTCCTGAGCTCAAGCGATCCTTCCACCTTGGCT- 
TCCTGAGTAGCTGGGACCTCAGGTGTACATTACCACGCTCAGCTAATTATTTATTTATT- 
10 TATTATATTTTTGTGACGGAGTTTCGCTCTTGTTGCCCGGGCTGGAGTGCAATGGTGC- 
TATCTCAGCTCACTGCAACGTCTGCCTCCTGGATTCCAGTGATTCT CCTGTC TC AGCT - 
TCCTGAGTAGCTGGGATTACAGGTAC^TGCCATCACGCCCAGCTAATT^ 
TAGTAGAGACGGGGTTTCATCATATTGGTCAGGCTGGTCTCGAACTCCTGACCTCAG- 
GTGATCCACCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGC6TGAGGCACCACG- 
15 CCCQGCA AI rTTTTTTTTCTTTTTTTTTTI I C AGACAGAGTCTTGCTCTGTCACCCAGGO 
TGGAGTGCAGTAGCGTGATCTCGGTTTACrrGGAACCTCCATCTCCCGGGTTCAAG- 
CGATTCTCCTTTCTCAGCCTCCCAAGTAGCTGGGACTACAGGTGCACACCACCACGG^ 
CGGGCTAATTTTTGTATTrrrAGTAGACACCAGGTTTCACCATATra 
TCAAACTCCTGACCTCAGGTGATCCATCTGCCTCAGCCTCCCAAATTGCTGGGATTA- 
20 CAAGCGTGAGCCACACACCTGG CTTA A I I II i I I ATTTTTGATCGAC AC AGGG TCTCCC- 
TATGTTGTCCAAGCTGGCAGAGATTTTTGTT 1 0 1 1 T GTTTGAGAGGGAATTTTGCTCTT- 
GTAGCCCAGGCTGGAGTACAATGGTGCAATCTTGGCTCACCACAACTTCCGCCTCC- 
CGGGTTTAACAGATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGAACTACAGGCACC- 
TACCACCACACCAGGCTAATTTTTGTGCnTTTTAGTAGAGATGAGGTTTCACCATGTT- 
25 GGCCAGGCTGGTCTTAAACTCCTGGCCTCCAGTGATCCACCCGCCTTGACCTCC- 

CAAAGTGCTG AAATTACAGGCGTGAGCACCGCGCCTGGCCTCTCAACCTACAATTT- 
. CAACACCCAAGGAAACAGCCCACCATGAGTGAGAACCAGCAGACACAACAAACTA* 
TAGGATTAGCTGCCTCCAAACTTCAGGTGATAGATTATCAGGCATGTACTTGAAAC- 
TAAAGGACACAAAAGAAGAATCCGAAATATAAAATAAAGGATTGGACTTGTGTGAAAAr. 
30 GAATCCCTTAGAAAGGGCTACTTTCAGGCTGGCCATGGTGGCTAATGGCCTGTAATC- 
CCAGCACTTTGGAAGGCCGAGGTGTGTGGATCACCTGAGGTCAAGAGTTCAAGAC- . 
CAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTGAAAATACAAAAATTAGCCAGGT- 
GGGGTGGCAGATGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGO- 
TTGAACTCAGGAGGCAGAGGTTGCAGTGAGCTGAGATTGCGCTATCGTGCCCCAGCC- 
35 TGGGCAGTAGAGTGAGATCAAAAAAAAAAAAAAAAAAAGAAGAAGAAGAAGAAAGGGC- 
TACTTTCAGACTGCCTTGCCAAAAATCATAACCACAATGATGAGCATGTATTGAGT- 
CAAAACAGAATCAAAAGAGAAGAAAGTCAATTTCTGTGCAAACTACTrrTTATTTATAAG- 
GAAAGTTTCTCTATTTTGTTTATAAACAlTAMCCAGTGCTGTGTGAAGGCAClTAATT- 
GGGGAGAGGTGGGGCAGGGATCCTGGTAGAGACCAATGTTTCCCACCCAGACCC- 
40 CAAGACTGCTGGGAGAGATGGTGTCAGCAGTGACTCCCAGGAATATCCAGTGGTGTG- 
GTGGCCCATCCCAGGCCCGGCTGGGCAGGTGGCTGGCTTGCTGGGGGATGTGAT- 
GATGGTGGTAGGCATGGGAGGCACTTTGGACGGGATC5TGATTTGGCAAAAGGAAGTG- 
GTTTCCTGTCCCCAGTGATTrCCAGCX^CTTCCCAGACCTCCCAAGGCTAAGGCAGAT- 
TACTAAATTTAAGGCTGGGGCCCTCCTTCTTCCCTGGACTTCCAGGAGAACAGAGAAC- 
45 CGGTGGCAAGGACCAGCACCAGGAGGGTGAGGGGTGCAGATAAAGGCAGCAAAAAA- 
CAGAGGGAGAGGTCTGGAGGGAAGGCAGGAATGCTTGTTTCTGTCAGCCTCAGAAAC- 
CTCCTTCTATCCTGCTAGACTTTACTCCTTTGAGGCTTCACCCTGGGGAACAGCTGGG- 
GAGAGACAGGATCTTCAGACATCAGGAGCTCCCACCTCCTCATCCCACATGGAAATC- 
CGCTGCCrGTCTCTATCCTCCCACCCCTrCCtAAGGGGACCTCTCAGCACCTCC- 
50 CAAACTGCTCCAGAATCCAAGTTCTGTGTCACCTCCAAGAACCAGATGGAACCTTCCA- 
ATCAGAGCCTCCACTGATGAAATGGAATAT1TCCAGTGTGTCCTAACTGCCATAAGGA- 
GAAGCCCACCTCTCTCTAACACCTTGGTTGTCTTTTTGGGTCCCACCTCCATATT- 
TAAAAAATCTCCTCTCTCAGGGCCGGGAGCAGTGGGTCACACCTATAATCCCAGCAGT- 
TTGGGAGGCCGAGGTGGGTGGATGACCTGAGCTCAGGAGTTCAAGACAAGCCTGGT- 
55 CAACATGACGAGACCCTGTCTCTACTAAAAACACAAAAAATTAGCTGGGCGTGGTGGT- 
GCATGCCCGTAATCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATCACTTGAATCCG- 
GGAGGTGGAGGCTGCAGTGAGCCAAGATCGCGCCACTGCACTCCAGCCTGGG- 
CGACGCAGCTGAAGCTGTGTCTCCAAAAACAAAACACACACACACACACACACA- 
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GAAAAAAAAAACCAAAATAAAAAAATCTCCCH^ 

GCCTTCTCCCCTAACCCTAATAGAGAATTTTCCTCAGTTACACTGTAATTTTATTAATG- 
GATTTTTCCTCATTCTGCCCAATGCAGTGTAATGAAAGCTTCCTCTCCATCTGTTATAT* 
TATATATAAATATATATTATATATTTATATATTATATATTTATATATAACATAT^ 
S TATTGTCACCCAGGCTGGAGTGCAGTGGCACCATCAGGGCTCACTGCAGGATCAATC- 
TCCCAGGCTTAAGCGATTCTCCTG TGTCAGCCTCC TG ATGAG CTGGGATTACAGG- 
CACCCGCCACCACACCCGGCTAACI 1 1 1 1 I 1 1 1 1 IGTAr 1 1 1 1 AGTAG AGATGG AGTTT- 
CACCATGTTGGCCAGGCTGGTCTAGAACTCCTGACCTGAGGAGATCCGCCCGCCTT* 
GGCCTCCCAAAGTGCTGGGATTACAGGTGTGAGCCACCTGGCCGGGCCCTCCACT- 

1 0 TCCTTCTTGTACATTGCTGAATCCCTGTGTCAGCCCTAGAGGTCCAGTCTTTTGGCCTC- 
TCCCAGCCTTAATCTACAATTCTGTAACCCACCCACCATCATTAAAATGAGATTCTTCT- 
TTGTCGCTTCCCTTGGGTAAAATGGATTATTCTTTAACCTCTCCACCAATACAACCAGG' 
6ATGATAATAAAAACATTGGATTGAGCAGAAACCAATCAAATAACTAGTAAGGCAGTAC- 
TGGCGAGCACCCTACATCCTGACAGCTTTATAAAGGGCGCTTCCAGCCAGGTGCGGT- 

15 GGCACATGCCTGTAATCCCAGGACTTTGGGAGGCTGAGGCGGGCAGGTCACCTGAG- 
GTCAGGAGTTCAAGACCAGCCTGGCCAACGTGATGAAACCCTGTCTACACAAAATA- 
CAAAAAAAAAAAAAAAATTAGCCGTGCGTGGTGGCATGCGCCTGTCATCCCAGCTAC- 
TCTGGAGGCCAAGGAGGGAGGATCACTTGAGCCCGGGAGGCAGAGGTTGCAGTGAG- 
CCCACATCTTATCACTGCACTCCAGTCTGGGTGACAAAGCAAGACTCCATCTCAA- 

20 ATAAATAAATACAMTTGGCCGGGTGCGGTGGCTCATGCCTGTAATCCCAGCACTTTG. 
GGAGACCAAGGCAGGTGGATCATTTGAGGTCAGTAGATCAAAACCAGCCTGQCCAA- 
CATGGTGAAACCCCGTCTCTACTAAAAATAGAAAAAGTAGCCGGGCGTGGTGGTGGT- 
GGGCGCCTGTAATCCCAQGCAGGAGAACTGGTTGAGCCCGGGTGGGGGGGGCC- 
CGAGGTTGCAGTGAGCACAGATGGCGCCATTGCACTCCAGCCTGGGCGACAGAG- 1 

25 CGAGACTCCGTTTCAGAAATAAATAAATAAAATAAAAATAAAAATAAAAAAATAATAGAA- 
ATTTAAAAATAAAATAAAGGGCTTTTCCTCAC CTACT CCACTAACTATAAGGGACCCT-. 
TACCCCCGAGATTACTATTAAATATAACGGACTTTTCGTCTCCTCX5CCATGAGCAAT^ 
ATGAGCTmCAGACCTCCCTCTCCCAATATAAC^ 

TTCCTGTGGGATCCCCCTnTCCCCAACCCCCAACTGTCGGGAGGTCCCCATGACTTC- 
30 ' TCCCCTGGGCTCACCCCGAAGTAGTTCCGCGGCACGTAGCCCTCCTGGCCGTGCAG- 

CGCGGCCCACCACCAGTCGGTCTCCTCCGGCCCGTCCCTCCGCAGCACGGTGAC- 

CGACTCGCCCTCGCGGAAGGACAGCTCGTCCCCGAACTCGGCGCTGTAGTCCCAGA- 
' GAGCGTACACTGCCCCGCTGTTCATCAGCCCCATACTCTGpTCGACGTCTGAAACAT- 

GCCACGGAGGGGAAGGTGAGAGCCTGGCCCAGGGGGTCCAGGAACAGGGGC- 
35 CACGTGGGGTCCAGGACAGACCCTGGAATTTGGCGCCTGTCCCAGCAACCACCTGAA- 

ATGTTGTGTGTGCGCATGGCTGTGGATGGGAACCGG 

GACTGGCCQTCTTTGAGCGTTCGAQGAAACTGGGGGAGGC ATGCGAGTGGGCGAC C- 
CACTCCCGAGGCAGGGTCAGAGGCTCCCAI I ICI II ICTTTCl nTTTTTTTTTTTl IGA- 
GACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCACGATCTCGGCTCAC- 
40 TGCAACCTCCQCCTCCCGGQTTCACACCATTCTC CTGCCT CAG CCTCC CGAGTAGCT' 
GGGACTACAGGCGCCCGCCACCACGCCTGGCTAATTT7TGGTATTTTTAGTAGAGT- 
CAGGGTTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTTGTGATCCGCC- 
CACATTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCTTT- 

n 1 1 1 m i \ n iii n n n gagatggaatttcgctcttgtcgcccaggcaggagtgca- 
45 atggtgcggtctcactgcaacctccgcctccggagttcgagccattctcct gcct- 

CAGCCTTCCAAGTAGCTGGGATTACAGGTGTGCGCCACCATGCCTGGCCAAi 1 1 i iG- 
TATCTTTAGTAGAGACGGGGTTTCACCATGTTGGTCAGGCTGGTATCAAACTCCTGAC- 
CTCAAGTGATCCACCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGC- 
GACCTGGCCCGGCCCTCATTTCCTTCTTGTACATTGCTGAATGCCCGTGTGAACCCT A- 
50 GAGGTCCAGTCTTTTGCCCTACCCTGGCGCTTAGCTTAAGTGGTACAGTCTCTAAG- 

GAAGATTCGCACCTTCCTTGAATGATAGGGTCCTTTAAGTTGGCTCATCTGCCTCT7TC- 

1 1 1 rem rem r c n nrcrm i ggagacggagtcttgctctgtcgcccaggctg- 
gagtgcagtggcgcgatttcggctcactgcaacctccgcctcctgggttccagcaat- 
tctcctgcctcagcctccaaagtagctgggactacaggcccacgccgctacacccgg. 
55 ctaaattgttttatatttttaatagagacggggtttcaccgtgttgcccaggctgg 

ggaaatcctgagctcatgcaatccgcccgcctcgagcctcccaaa gtgctagg atta- 

CAGGCATGAGCCACCGCGCCTGG C T T rCTTTlTCTTTTCTnTC TTTTTrrTri ICAGA- 

caaggtctcactctgccacccaggctgcgggagtgcagtggtgagatcaagcttact- 
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GCAGCCTCGAACTTCCAGATTCAAGCAATCCT 

TATGTTATTATTAAATATTTTGTAGGCCGGGCACAGTGGCTCACACCTATAATCACAG- 
CACTTTGGGAGGCCAAGGCAGGCGGATCCTCTGAGGTCAGGGGTTTGAGACCAGCC- 
TGGCCAACATGGCAAAACCCCGTCTCTACT AAAAATACAAAAAAAAAAAAAAAAAAAGT- 
5 TAGCGGGCCGTGGGGCCCTTGCCTGTAATCCCAGTTACTCGGGAGCCTGAGGCAG- 
GAGAATCGCTTTCACCQAGGAGGCAGAGGTTGTAGTGGGCTATGGTGCCATTGCAC- 
TCCAGCCTGGGTGAGAGAGCAAGACTCTGTCTCAAAAAATAAATAAATAAAAATAA- 
ATAAATATTTCGTAGAGGTCAGGTGTGGTGGCTCACACCTGAATCTTAGCACTTTGG- 
GAGGCCAAGGTGGGCAGATTGCCTGAGCTCAAGAGTTCGGGACCAGCCTGGGCAA- 

1 0 CACTGCAAAACCCCTTC7GTACTAAAAATACAAAAAAATGAGTCGGGGATGGTGGT- 

GAGCACCTGTAGTCCCAGCTACTCAAGAGGCTGAGGCAGAGAATTGCTTGAATCCAG- 
GAGGTGGAGGTTGCAGTGAGCCGAGATTGAGCCACTGCACTCCAGCCTGGGTGA- 
GAGTGAGACTCTGTCTCAAAAATAATAATAAATAAATATTTGTAGA<3ACAGGGGGTCTC- 
TACAATGTCTTGTAGCCTGACCAGGCTCACCTTTCAAATATATAACCCTCTGTCTCACC- 

1 5 CATAAGTCCTAGGACCTGCCTCACTCCAACTCTCCGTGAAGTTCCTTGCCCACACCGA- 
GATACAACTGGCTCCTCCAGGTGTGAAATGACCCTGTGCACAATCCCCGTGGCACAG- 
CCTACTTCGCCCTGCCCGTCGGGGAACCAGGTGATGTAGCCTGCCCCCTGGAGAGAr 
TAGGGTACAGCCnTGTGTCTTCCTACAAGCCCCTTTCTGGGAGCTGTAGCCTGCTCAC-- 
CTGCCAGTGGTGTGGCAATGCCTCTCCCACAAGTGGCAGAGCCCACCTGCCCAGAG- 

20 CCCTATGCCAGGTAGATGGCAGGGTTGAAACGTTCAGCTCCTCACGGTTGAAGATGT- 
GAAAGGTGAGCAGACGAATCXTTCACAGCCACTCTCCTCCCCAAAGGTGTCCAGCrCG- 
CATAGCACAGCCTCCATGTCCCCTTTTCCCTTAGGAGGGCATAGTCCCCCCACCCC- 
CGCAAGCGGTCCATCCCTCATCCTCCTCCTCGGCAATCCTGCCAAGTGGTTGGTA- 
CAGCCCCCATACCCTTCTCTCCCTAGTAGGGGGTAGTTGCTCCCCTCCCCGCTCCTG- 

25 CGCACCCGCCAGGTACCCAGGCGCCAGCAGCCCTGCCTCGCACCTGCCAGGTAGGT- 
GGCGCAGTCAGCATAACCCTCGCGGTAAGGGTCGCACTTCTCGAAGGCGGTGGCGC- . 
CGTCGCTGAGCGTGGTGGCGAAGATTGCAGCGCCGTGCTGCACCAGCGCCATGCA- 
GATGACTGTGTCGTTGCACGACGCCGCGCAGTGCAAGGGTGTCCTAGGCGTGGGG- 
GTGGGGGG7TGCGGGGAACGATGCGTGAGAGGCTGCGCGTCCGCCCACGGGGGAC- 

30 CCAGCCCACCGCGCS3GGTCGGGGCTCACGAGCCGTGGCTGTCGGGGGAGTTGA- . 
CATTGGCACCCGCGGTGATGAGGAAATCCACGATAGAGTAGTTGGCGCCGCAGAT- 
GGCGTTGTGCAAGGCAGTGATGCCCTCCTCGTTGGGCrGGCTCGGGTCGTTCATCT- 
GAGTGCACCGGGGGAGGGGGAAGACTCAGTCCCGCGGCTGGCATCTGCGATGCCC- 
CCGCCGTGCCCACCTCCCGCTCAGCAGCGCTCACCTCCTTCACCGCCTGCTGCAO 

35 CACCTCCAGCTCCCCGGTCAGCGCCGCGTCCAGGAGGAGCACCAGAGGGTTGAGG- 
CGCGCGCGGCGGGCCTTGCGCGGGGAGCCCGCCTTCCGCAGCACAGAGCGCATC- 
TCCTGGGGGACAGGGCGGAGAGGTCAGCGACTTGGAGGGATTGTTAGTATA7CCAT- 
GATCTAGAGTAGGAAACAGAGGTCCAGGGACTTGTGGCACCCATCTAGACAGGGGTA- 
GAACTGGGATTCCCTCGGGATGGGGTGAGGGGGTGCCTTCGATCTCCTCCTAGAGCC- 

40 TCCAGTTCCCTGCCATAGACAGGGAATCCTGTGATTTGAGAATCTTGGGCCCTGAAAC- 
TTGGGAGAAAGCTGGGGGGCGATGGGATTGGTGGCAAAGTAATTCTATCAGTT- 
CAAAACAATGATTGTGGAAGCCAGTTATGCAATTCACACACAGTGTCACATTTCTTTT- 
GTTAATAATGAATGCAATGAGACACACATGACAAAATGTTACCAGGAG TGTT CATTC- 
CGGATGTTTGGAATTTGAGCATTTTATTATTCCTTGTATTTTG C I I 1 I CT1 I I I CTCTTT- 

45 rTnrrmTm ig agatggagtctcqctctgtcacccaggctggagtgcagtg- 

CAGTGGTGTGATCTCAGCTCACTGCACCCTCCATCCCCCAGGTTCAAGCAATTCTCCT- 
GCCTCAGCXTCCTGAGTAGCTAGGATTACAGGCATGCGCCACTATGCCTGGCTAATTT- 
TCATAI I I I iAGTAGAGACAGGGTTTTGTCATGTTGTCCAGGCTGGTCTCGAACTCCT- 
GACCTCAGGTGATCCACCCACCTCAGCCTCCCAAAGTGCTAGGATTACAGGTGTGAG- 

50 CCACTGTGCCCAGCCTCATGGGCTTTCTTAn 1 1 I AATTTTCCTCCTGTAAGATTCATT- 
TATTCTGGGCTGGGCGAGGTGGCTCATGTCTGTAATCCTAGCACTTTGGGAGGCT- 
GAGGTGGiSAGGATCACTTGAGCCCAGGAGTTCGAGAACAGCTTGGGCAATATAGTGA- 
GACCCAGTCTCTACAAAAAATAAAAAATTAGCCTGACATGGTGGCGCACACC- 
CGTCGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGGATTACTTGAATGGAAGAGAAG- 

55 GAGGCTTCAGTGAGCCATGATCATGCGACTGCACTCTAGCCTGGGCAACAGAGTGA- 
GACCCAGTCTCAAAAGAAAAAAAAATGCATTTATTTATTCCAAGTGTGTGAGTGCATAG- 
CATTTGTGATTCTGGTCTTrGCTGTTrCCAGAGTTTGAGTGATTT^ 
CAGAGATCCCAACAGCCACTGAATTCAAAATTCCCAGATGCTCAGTTATTTCAAGTTTC- 
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CAATATGTTGTGATTGCA6AAATGCTAGGCT6TQCTATTTCAAATTGCTGAGGGGC- 

CAGGACTTTGGAATCCAAAGATTCT ATGATGGAGAACTTTAATATTTTTCTGTTAGAATT- 

T Ci I 1 1 riTT GTTG GI 1 1 1 1 1 I GAGACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTG- . 

CAGTGGTGCGATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCAGGCCATTCTCC- 

TGCCTCAGCCTGCCAAGTAGCTGGGACTACGGGCGCCCGCCACCACGCCTGGCTATT- 

TTGTATTTTTAGTAAAGATGGGGTTTCACCGTGTTAGCCAGGAAGGTCTTGTTCTCCT- 

GACCTCGTGATCCGCCCACCTCGGCCTCCCAAAGTGCTGGGATTACAGGTGTGAGC- • 

CATCATGCCTGACCTAGAATTTCATnTAAAAGACTAGAAGGAAATGGCTGGGTGCG* 

GTGGCTCATGTGTGTAATCTCAGCACTTTGGGAGGCTGAGGAGAGTGGATCACCT- 

GAGGTCAGGCAGGAGTTCAAGACCAGCCTGGCCAACGTGGTGAAACCCTGTCTCTAO 

TAAAAATACAAAAATrAGGTGGCCGTGGTGGTGCACGCCTGTAATCCCAGCTACTCAG- 

GAGGCCGTGGCATGAGAATCACTTGAACCCAGGAGGCACAGTTATAGTGAGCTGA* 

GATGGCACCATCGCACTCCAGCCTGGGTGACAGAGTGAGACTCCATCTCAAAAAAG- 

GAAAAAAAAAAGAAAGACTAGAAGGAAATATTCAAAAT GTTAA TGATGGTTCCCTGT- 

GAGTGGTGTQATTTTGTCCT C 1 1 IC) IC T AI I I 1 1 A TTTATTTTCCCCAAGCTCTCTATG- 

GTGTTGGTGTATTTCTCTATAGTGGAATGTGTAAATTTAAAGTATAAATCTCAGCTGGG- 

CACAGTGGCTCATGCCTGGTTTGAGACCAGCCTGGACAACATAATGAGAACTGTCTC- 

TACTGAAAATGTTAAATA7TATCTGGGAGTGGTGGTGCATGCCTGTAGTCCCAGC- 

CATAGGGGAGGCTGAGGCATGAGGATCAATTGAGCCCAGTAGGTGGAGGCTGCAGT- 

GAGCCATGATCTTGCCACTGCACTCCAGCCTGGGCAACAGAGTGAGACTCTGTC- 

TCGATAATAATAACCCTCTATTACAACATATCAGTGCATGAATTTGTGATTTTATAATT- 

CAAAATATGAGCATCTTTAATTGTCAGATTTGGTGACTTCAAGAATCA GTAAT AAT- 

CAGTCTATG ATACTAACTTTATAATT A I HUM I AAGAGAAG AGTTTCCTTTTATTT- 

TATTTTATTTGAGACAGAGTTTCTCTCTGTTGCCCAGGCTGGAGTGCAGTGGCGCA-^ 

ATCTCGGCTCACTGCAGCCTCTGTCTCCTAGGTTCAAGCAATTCTCCTGCCTGAGCC- 

TCCCG AGTAGCTGG GATTACAGGCATGCACCACCAGGCCCAGCTAA I 1 1 i I GTATTTT- 

TAGCAGAGACGGGGTTTCACCATGTTGGCGAGGCTAGTCTTGAACTCCTGACCT- 

CAAGTGATCCACCCGCCTCGGCCTCCCAAGGTGCTGGGATTACAGGCATGAGCCAC^ 

CGTGCCX^AGCCTAACTTTATAATTCTAAGATCGTGTTCAAACCTTTAAATGCTCTAGGG- 

CTCTAAAATGTTAGTATCCTAAGACGGTGACACTAGCGTTrQATTCTTACATTGTATCAT- 

TTTTTAAGTTTCTCTGTGGCGAGGACTCTGTGATTCTACAATGGGATGCTCAGCCATTT- . 

CAACATGTTGTTATTCATCCCCTCTTGATTTCAAAATCCTGAGCCTCAAGGTTCCTTGC- 

CTTTACTTTCAGGAGGGCCTAGGAATAGGCATTTTGGGGGGGTCCACCTGACCCCTG- 

CTTCTCTGAGAAGTGATCTCTTCCCGCTGTCTACGCACACGGAGTGTTCAGGACTGT- 

TCCATGTGGCTACAACCCTCTTCCCAGTCAAGATGCAGGGACCAAGATCAGCAGGA- 

GACCATCCCCTGGTCOAATGGTGACAACAGTAAGAGCAGTTAACAGTTATGTGCCAGG- 

tattatgctaagcactacattaatgtatttaatcttgggggggtgtggtggctcacacg* 

tgtaatcccagcactttgggaggccagggcgggcaGatcacttgaggtcaggagtt- 

caagaccagcctagccaacacagtgaaaccccatctctactaaaaatacaaaaattag- 

ccaagcgtggtggcatatgcctgtaatcccagccacttgggagactgacgcaggaga- 

atcactttaacocaggaggtggagtccagcacccagccgagactcacttgtttttatt- 

T ATTT ATTTATTTATTTTT A t rTTTATTTTTTI 1 GaGACQQAATGTTGCTCTGTCACC- 

CAGGCTGGAGTGCAGTGGCGGGATCTCAGCTCACCACAAGCTCCGCCTCCCGGGCT- 

CACGCCATTCTCCTCTCAGCCTCCAGAGTAGCTGGGACTACAGGCGCCCGCCACCAC- 

CCCCAGCTAAI 1 1 1 IGTAI 1 I 1 1 AGTAGAGACGGGGTTTCACCGTGTTAGCCAGGATG- 

GTCTTATCTCCTGACTTCGTGATCCGCCCGCCTCGGCCTCCCAAAATGCTGGGATTA- 

CAGGCATGAACCACCACGCCCGGCCTATTTATTTATTTATTTAGAGATGGAGTCTTGC- 

TCTGTCGCCCAGGCTGGAGTGCAGTGGTGCAGTCTTGGCTCACTGCAACCTCCGCCT- 

TCCGGGTTTAAGCGATTCTCTTGCCTCAGCCTCCTGAGTAGCTGGGATTGGAATGA- 

GACCACCACTTCTCXTGTTGTCCTTCCCAGCTrCTCCCCCACCTCCCCTmCCCTAGT- 

7TATAAGACAGGAAAAAAAGGGAGAAAGCAAAACGCTGGAAAAAAACAGAAGTACGA- 

TAAATAGCTAGATGACCTTGGCGCCACCATCTGGTCCTGGTGGTTAAAATAATAATA- 

ATAATATTAATCCCTGACCAAAACTACTGGTGTTATCTGTAAATTCCAGACATTGTAT- 

GAGAAAGCACTGTAAAACGTTTTGTTCTGTTAGCTGATGTCTGTAGCCCCCAGT- 

CACGTTCCTCACGCTTACTTGATCTATCGTGGCCGTTTCACGTGGACCCCTTAGCGTT- 

GTAAGCCCTTAAAAGTGCTAGGAATTT C 1 1 1 J I C GGGGAGCTCGGCTCTTAAGA CGCT - 

GATGCTCCCGGCCGAATAAAAACCTCTTCCTTCTTTAATCCGGTGTCTGAGGAGTTTT- 

GTCTGTGGCTCGTCCTGCTACAGAATTACAGGCACGCGCCACCGCTCCGGGCTAATT- 
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TTTGTAJ rm n'AGTAGACAGGGGGTTTCACCATGTTGGTCAGGCTGQACTTGAACC- 
TCTGACCTCATGATCCACCCACCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGT- 
GAGCCACCGCGCCCGGCCGAGACTCACTATTTTATAAGAGGAGAGAGCAAAGCCAG- 
GAACAGTGGCTCATGCCTCTAACTGCAGCAATTTGGGAGGCTGAGGCAGGTGGAT- 
5 CATTTGAAGTCAGGAGTTTGAGACCAGCCTGGCCAGCATGGTGAAACCTCATCTCTAC- 
TAAAAATACAAAAATTAGCCAGGAGTGGTGGCATACACTTATAATCCCAGCTACTTGG- 
GAAGCTAAAGCGGGAGGATGGCTTGAACCTGGGAGGCGGAGGTTGCAGTGAGC- 
CGAGGTCAAGCCACTGCACTCCAGCCTGAGTGATGGAGCAAGACTCTGCCTG- 
GAAAAAAAAAAAAAATAGAGGAGAGAGCAGAGCAGACAGAAGAGACACAGAGACAGA- 

10 GAGGGAGAGAAGAGAGGGTGACTGCTTTGATTCAGGCAAGACTTCTCAGTCCCAGA- 
ATGAACCCACTGTTGTGCCAAGACTCAGTCATGTCCAGGTGTATGACTCGAGATTGCT- 
GAAGGAATGCCCGGGGCAGGGCACAGGCACAGGTTATTGGAGAGAAGGAGCAGA- 
GAACATCTCTATGTGGCCAAGACTCCCAGATGGCCCTCCATATAGTCACACACAGC- 
TATCCTAAAGACTACATTTCCCAGCATCCCATTGCAATGAGGCTCCTGGCCAGTGG- 

1 5 GAGCAGGCAGAGTGATGTATGGAACTCCCAGGTTCTGCCTGAAACAGGAAAGGGCAC- 
TTTCTCTTCTTCTTTCTCT^ 

GACCATGAAGGCAGGCTTACTCCCCGATGGATGGCAGAGCCCCAGGTAGATAGAGCC- 

TGGGTCCTGACTCCAGTGAGGTGCCTACAGTCCTGGGCTGCAAACTCTTGGACTTC- 

TACTCAAAAGAGGAGAAAACTTCGATCTCATCTAAGCCACTATATTTGGGGGGCTCT- 

20 TTGCTACAGCTCCTGGATTCATGTAGCAAACATACCCCGGTTTCCTCCTGTATTACT-. 

TACCATGCTCTGCGGCTGCTCTGGTGGGCTGCTCTGGGACGGGGCCGGGGGTGGA. 
ATGGGAGCTGGTGGGGCAGGAGCAGGGGGCCCTGCCCTGGCCTGAGATCCCTCAGT- 
GATGGGGGACAGCTCTGGCTCCGGCCCCCCGGGCCCTGGCCCCCCATGACGATG- 
GAAGAGGCQGCTGATGATCTGCTGGTACTG 1 1 lO l 'l GTGGGTAGGGGGCAQGGCCA- 

25 . CAGCAGGGGC CTG CTCCATGG AG CCCCTGCGTTTGAGG GGCCGGGG AATTTCCGC- 
. . -CAACACCCGTGCCACGTCCTCCAGCTCGGGCACCGACTGTGCCTCCGGTGGCAGTG- 
CTGGCTGCAGCCTCGTGGGGCTGAGAGGCCTTGCTACAGGGCCTTCATCCACATCG- 
. 1 CCAGCCTCCAGC ACTG GTGTCAGC AGCCCCTCTATCTCCGGCTC AGGCTCCAGGTCG- . 
: GTGGGGGGTTrGGGGGGTCCTAGCCGGAACAAGAGCCCATCAGAGGACAGGTCCC- 

30 CAGGAGACACCCAACACTCCCTCTCCACAACTTCCAGGGCATACAACCAGCACATGAT- 
"TTTCTGTGTGACCTCAGGGAAGTTCCTTGCCCTCTCTGGGCTACACTTTCCTTGGGCT- 
• GTGAATAATATACAATTATGATGCCTCCCATTTATTGAGCAGTTAGTATGTGCCTGGCG- 
CTTTACATGCCTACCTTATTGTAATCTCACCACTGCTTTGTGAGGTAGATACACTGC- 
CATCTCCACATTACCGAAAGGGAATCTGGGCCTCAGAGAGGACAAGTCAGTTGCC- 

35 CAAAGCCATGCAGTTGGGACTTGAACTCAGTTCTGGCTGACTCTAGAATCTACTTC- 
TACCAACCGTGATAGATGTGATTTTCTGAGATCCTGAGAGTTTCCTCTCCTAACATCT- 
CAGGCAGAAAACTCCAGCAGGAAGTAGAATCC7GGTGTTTAATGATTTCTTCTCTGTCT- 
TACTCATTCTGACAGTAAAGCAGGTGGAAATAAAAATATGCATTATTGGCT- 
GAGTCGAGTGGCTCACACCTdTAATCCCAGAACTTTGGGAGGCCGAGGCAGGCA- 

40 GATCTCTTGAGATCAGGAGTTTGAGACCAGCCTGGCCAACATGGTAAAACCCTGTCTC- 
TACTAAAAATACAAAAAAAAAAAAAAAAAAAAAAAAATTAGCTGGGCGTGGTGGCACAT- 
GCCTGTAATCCCAGCTACTCGGAAGGCTGAGGCACAGGAATCGCTTGAACCCAG. 
GAGGCGGAGGTTGCAGTGAGCCGAGATTGCACCACTGCACCACTGCACTCCAGCCT- 
GGGCAAAAGAGTGAGATTTCATCTCAAAATATATATATATACACACACACACACAAACA- 

45 .CACACACACATTATATATATAGTGTATATATATTTTTATATAGTATGCATATACATATAA- 
ATAATACACACACACACACACGGCTGAGCATGGTGGCTCATGCCTGTAATCCCAGCAC- 
TTTGGGAGGCTGAGGTGGGTGGATCACCTGAGGTCAGGGGTTCGAGACCAGCCTGG- 
CCAACATGGCAAAACCTCATCTCTACTAAAAACACAAAAAATTAGTTGGGTGTGGTG- 
GTGCATGCCTGTAACCCCAGCTACTTGGGAAGCTGAGGTAGGAGAATCGCTTGAACO 

50 TGGGAGGTGTAGGATGCAGTGAGCTGAAACCTCACCACTGCATTCCAGCCTGGGCAA- 
GAAGAGTGAAACTCCATCTTGGCTGGGCACGGTGGTTGACGCCTGTAATCCCAGCAC- 
T7TGGGAGGCC6AGGTGGGCAGATCATGAGGTCAGGAGATCGAGACCATCCTGGO 
TAACATGATGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCTGGGGGTGGTGGTG- 
GGCGCCTGTAGTCCCAGCCACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCG- 

55 GGAGGCGGAGCTTGCAGTGAGCAAGCACCACTGCACTCCAACCTGGAAGAAAGAG- 
CGAGACTCTGTCTCAAAAAAAAAGAGTGAAACTCTGTCTCAAAAATAAATAAATAA- 
ATAAACXCCAAAACACACACAGATACACATTATTTCATTGAATCCCCGTCACAATTCTA- 
TAGGGTAGATATTATTAATCTCTCTTCACAGACGGGAAACAGAGTTTCGGACAAGTAAT* 
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TTATCTTCAGTCACACAGCMGTTAGCAGTGAAGAGAGACTrCCAGCCCATCTGCT- 
TAACTCACTGATCTC^CACCTCAAAATATTAATAAATTATTATAACTAATATGGTAGC- 
TATTTATTTGAGACTGGGTCTCACTCTGTCACCCAGGCTGGAGTGCAGTGGCGCTAT- 
CACAGCTCACTGCAGCCTGGATCTCCCA3QCTTAAATGATCCT CCCACCT CA GCAT CC- 
5 TGAGTAGCTGGGACTACAGGCGCCCACTACCATGCCCGGCAGATTTTTTGTACrnT- 
TATTTTTAGTAAAGTCrrATTTTAGTTTCACTATGrrGCCCAGGCT^ 
GAGCTCAAGCAATCCTGTCTGCATTAGCCCACCAAACTGCTAGGATTACAAGGGT- 
GAGCCACGGTGCCTGGCTAATATGGTAGCTATTGATAGCTTACTATGTATCAQATCC- ' 
TATTTATTTATTTATTTTTGAGACAGAGTCTCACCCTGTCACCTGTGCTGGAGTGCAGT- 
10 GGCATGATCTTGGCTCACTQCCACCTCCGCCTCCTTGGCTCA AGCTGAGTAGC TAG- 
GACTACAGTGGTGAGCCACCATGCCCAGCTAAI II 1 1 II 1 1 II M II I I II I M iGATAr 
GAGATGGGATTTCATCATGTTGTCCAGGCTGGTCTTGAACTCCTGACCTCAAGTGATC- 
TGCCCACCTCGGCCTCCCAAAGTGCTGGGATTACAGGTGTGAGCAACTGCACCTGGC- 
CCATCAGGTGCTGTTTTAAAGGCTTTATATGAATTTAATAACATATGTCAATAG- 
1 5 GATCGATTCT ATCATTATTTGCC 1 1 1 M 1 1 1 1 1 1 1 1 1 I I 1 1 I I I'GAGGCAGAGTCTCCC- 

CGTCACCCAGGATGGACTGCAGTGGCGCAATCTCGGCTCACTGCAACCTCCACCTCC- 
CGGGTCCAAGTGATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGGACTACAGGCGCC- 
CGCCAGCATGCCTGGCTAATTTTTGTATTTTTAGT AGAGATGGGGTTTGATATTGGC- 
CAGGCTG6TCTCGAACTTCTGACTTTGTGATCCGCCCGCCTCG GCCTC CCAAAGTGC" 
20 TGGGATTACAGGCATGA6CCACCGTGCCCGGCCCATTATTTCCCI I I IACACTCAA- 
GAAAATTGAGGCCCAGTGAGGTTAAGTGACTTGCCCAAGGTCACACAGCGTGGAAC- 
CAGGCAGTCTGGCTTCAGGGTCCACACTTAACCTTTGAGCTATCCCTGGCTCCTACC- 
CAAATTCCCAAACTCACCTGGCCTAGCTCTCTGCAGGGACAGTGCTTGTAAAGAGG- 
CATTTGGCTGTGATCTCCCCACCTCCCAGGGCTGGTCTGGTCCCCCTGCCATTTGTCC- 
25 ■ TCCCTTCACCCAGTCCTCTAGGGCCCTCATTGCTGACTCACCTTCGTTCACAGGGGC- 
i CATGTCTGTTGGGGATGCTGGGGGGCTGGGGTAGGGGT7TGGGGTTGGGTCTGGGG-. . 

CTGTGGGGGCAGCTGGGGCTGTGGTTGTGATTGTGGCTGGGGCTGTGGTTGTGGTT-- 
GGGGCTGCAGCTTAGGCGGGGGTGCTCGGGTGAAGAGGGGGGACCCAGGGAGCAT- 
GGCGCGGCTGGCCCCGTGCTCCCAGAAGGCGTTCTGCAGCTTGAAGATCATGCT- 
30 GAGGGGGATGGGACGCTGGCGCGGGGCCCCGCGGGGCTGGGGGCTGGAGGGGGG- 
CATGGGGATGCGGCTGACGGGCTGCCAGCTGCGAGGCAAAGTGCCCGACGGCCC- 
CGCGGAGCCCAGCGAGCGCCGGTAGCTGCCCGCGTCTGAACGCCGGTCGCTGGC- 
CAGAGGAGAGACCTTGTAATTGCGCGGCAGGGTGGCGCTAGTGAGGTTGTCCTGGG- 
GAAGAGGGAAGGGAGAAGGGGATCGGGTGAGAGAGGGAAGGTGGAGGGGAGG- 
35 TAAAGACAAAAGACGAGAAGGGAGAGGAGGTGAGGGAAGCCCTGGGAGTGAGGGA- 
GAAGAAAGGGTGAGGAAGGAGCAGAAACCCAGCACAGTGAAGGGAGAGCGTGG- 
GAACGGGCGCCGAGACCCAGATGGCAGCCCCGAGGGGGAGACTGGCCTTGACCC- 
CGCTCCCCCACCCCACTCCTCGACCTTCCCCAGCCTCTCCTCCCCAGGCGTCGCCTC- 
CTCACCTTGCCGGTGCCCCCCAGTCCATCCAGGCTGCTCTCCCTCGAAGGCAACAGC- 
40 TGCAGGCTCGGCGAGGCAGGCCTTGCGAAGACGTCCAGGCCTGCGGGGCGGGAAT- 
CATTAGGGTCTGTGGGGCTGCCTCTCCTCCGGGTCCTCCATrCCGCGGGCCTCCAC- 
CACTCACGTTCATAGCTCGCTGTCTGCGAAGGCTTCTTCTCGTACGCCACGTCCAGGT- 
CAGACTCGTTCCAGGCTTTCGGAGGCCGCCGGCGCAGCGTCAGGTCGTCTGGGGA- 
GAAGTTTCCAGGGAGGATGAGACGGGAGGGGTGGCGAGCCCCGGATCCTGCCCGCT- 
45 TTGACCCCGCGAGTCAAAGGCCCCGCGAGGGGCCCCTGGGTTCACCTTGCGCGCG- 
CAGAGGCGGGGCGAATGCGCTGCCGCCGGAGCCTAGCAGGGAGCTCCCGAAGGCG- 
GACGCTGGCGCGTCGTAGGCTGTGGCAGGGGGGCGCGGTGACGGCCCACGCTCGG. 
GGAAGAAGGCCTGGGGCCCCTCCGCCAGGGGGCTGCCGCGGGGGGAGCCTGCG- 
CGGCCCAGGAAGTCGAAAGGCGTGGGGGGACCCTGCTGGCGGAGCGGGCCTGGCC- 
50 CGGGCCGCGGGGAGGGCGCACGGCCGAGGGAGCTGCCTGCGCCATCGAAGGCG- 
CGGGGCCGGGGCGAGGTCGCGCGGTCCAGGCTGCCGTAGGCGTCCGGCTGCAGG- 
TAGAGCGGGGTGCGCGGCGACGACGGCCGTCCCTTGGGGGACAGCGGGCTGTAGG- 
GGTGTAGGGTTGGGGCACTCTCTGATCGTCCGAACGGGGTGTCTGCGCCGTCGGT- 
GGCCGCCTTCCGGGGGGACCCTCGGCTGCCGAAGGGCTCAGGGATCGAGCTGGAG- 
55 CTGTACCQGGGCGGCTGTGGGGAGGCCAGGGCATTGAGGGATGGATCAAAGGAGA- 
CATTAGTGGAAGGGTTGGTGTGTGGGCGGGGGTGTCAAGAGAGATCACTGGAGGT- 
CAACCCAGAGGAGGCTGACCGGCCATGGAAATTCAGGCACAGAGAGCCCAGGTGAG- 
TAGTGGTGGGGAGACAGCCCTGAATCAGCACTGTGGCTAGCCCATTACTCTATGT- 
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CACCmATGCCACTTAGGTAAACACCTCrrrTCCnCTGAGGGTCCCm 
CACT TCCACTGGT CCCCTCTTTTCTA'i 1 1CI I I CT TTCI IICI1 rCTCTCTCl I iui I i« 

TCCCTCCCTGCTTGCTTC 
S TCTTTCTTTCTI 1 C TTTTCTATCTCGGCTCATTGCAGCCTCAACCTCCCTGGCTTAGT- 
GTGATCCTCCGACTTCAGCCTCCCAAGTAGCTGGGATTACAGGTATGCACCACCA- 
CACCTGGCTAACTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATGTTAGCCAGGCT- 
GGTCTTAAACTCCTGACCTCAAGTGATCC GCCTG TCTCTGAAAGTGTTGAGATTA- 
CAGGCGTGAACCACCGTGCCCAGCC^GATrTTTAAAAAATCATTTGTAGAGGCTGGTC- 

1 0 TCAAACTCTTAGTCTCAAGCAATTCTCTCACCTCGCCTTCCAAAGTGCTGGGATTC- 

CAGGTCTGAGCCATGGCGCCTGGCCTGGTCCCCTTTTTTCAAGTTCCCTTGAAGAGC- 
CCACAACCTGCATAACTATATGGGGCAATTTTGCCTGAAATCCAGGCCTCTGGTCTG- 
GACTGTGGCGAGAGGCTGGCTTTGGAGATCAAGGTGGGAACCAGGCTTACCCTA- 
GAAGGGGGTCCGGCCTGCGGGCCAGGAGGCGCGGGAGAGTCTGACCACAGCGAC- 

1 5 TCCAGCTGCTTGGTCAGTTCATCCACCTTGGCCGCCGCCGTGTCCAGCTCCATCTGC- 
TTCAGATCCATGTGTTTCATGGCCAGCGCTGGGAAGGTGGGAGTGGAGGTAAGGACC- 
TGGCCTCCTGGCAGGGGCCGGCCTCAGCACCCCTCGCCCGCTGCCGAGGTCCCCG- 
CCTCGCCAGCCCCGCCCCCTACTCCAGC7TACACTGGAAGTTCATGTCCAGAAAGTC- 
CCGCGCGCTCTGGAATGCCTCGCTGTCCATGGTGCCGGCCGGAGCGGGCGCCTG- 
W 20 CATGGTGGGGAGGGAGGGAGCTGGCTAAGACCCCGCCCCTCTAGACCCCGCCCT- 

CAGGGAGTCAGACGCCGTCAGGAGCGGGACAACGCCTCAACTCAGTTCCTTCCCCT- 
GGAAGCCCTTTACCCTTTCACCTCCCCAGCTGGGAAATGCCAACTCCTCCAAAGC- 
CAAGTCCATGCGGCACGG AGAAGTCCAAACCCAG TCTAAAACCTCCGGAATTCACTT - 
TCTCTTT Ci M Ml iCt M IC1 1 H 1 1 M i 1 M IH 1 I QTGTATGTGTGTGAGAC A- 

25 GAGTCTCGCTCTGTCGCCCAGGCGGGAGTGCAATGACGCGATCTTGGCTCACTG- 

CAACCTCCGCCTCCCGGGTTCAAGCAAATCTTCTGCCTAGCTGGGAGTACAAGCGCG- 
CGCCATTATGCCCGGCrTAATTTTTGTAGTTCTGGGATTACAGGAGTGAGTCTCCGCGC- 
CCGGCCGTGTCCATCTCTTTATCTCAGTCCTAAGACCTGAATCACTCCTTGAACAAT- . 
TATCTATTGATCACCTACAATGTGCCGGTAAACATAGGATQ QAATAA CTA TGAATTA CT" 
• 30 GAATGTTTACTAGGGACCAGGACGCACTGTGCTAGATCCTGTTTTTGI rTGTTTTTGA- 
GATGGTGTCTCGCATTTTCGCCCAGGCTGGAGTGCAGTGGCGCGATCTCGGCTCACT- 
GCAAGCTCCGCCTCCAGGGTTCATGCCAGTCTCCT GTCT CA GCCTCC CGAQTAGCTGr 
GGACTACAGGCGCCTGCCACCATGCCTGGCTAAATTTTTGTA7TTTTAGTAGAGACGG- 
GGTTTCACCGTGTCAGCCAGGATGGTCTCGATCTCCTGACCGCGTGATCCATCTGCC- 

35 TCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCCTTGTTT- 
TTGTTTTTTAATAATAATTCTGCTGTCTGCTGTGTACTAGAACCCATGCCTACTG 
GGGTATAATGTAGTAAATGTAGTAAAAACAATATCCGCCGGGCGCGGTGGCTCACGC- 
CTGTAATTCCAGCACTTTGGGAGGCCAAGGAGGGCGGATCACGAGGTCAGGAGAG- 
CGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACGAAAAAT- 

40 TAGCCAGGCGTGGTGATGGACGCCTGTAGTCCCAGCTACTC6GGAGGCTGAGGCAG- 
GAGAACGGCGTGAACCCGGGAGGTGGAGCTTGAACTGAGCGGAGATCGCGCCACTG- 

• CACTCCAGCCTGGGCGACAGTGCGAGACTCCGTCTTAAAACAAACAAATAAATAA- 
ATATGTTTAAAACAACAACAACAATAACCAGCCAGGCGCGGTGGTTCACTCCTGTAAC- 
CCGAGCACTTTGGGAGGCCGAGGTGGATGGATCGCTTGAAGCCAGGAGACCAGCCT- 
45 GGCCAATATGGTGAAACCCCGTCTCTACAAAAAAATACAAAAGTTAGCTGGGCATGGT- 
GGCATGTGCCTGTAATCCCAGCTACTCAGGAGGCTGAGGCACAAGGCTCACTTGAAC 
CTGGGAGGCACAGGTTGCAGTGAGCATAGATTGTGTCACTGCACTGCAGCTTGGGT- 
GACAGAGCGAGGCTCTATTTAAAAAAAAAAAAATTAATTGAGGGGCCACTCCCTTCTA- 
GAGTGGTGAGAAATGCCGTGCACCGAAAGCTTCATTTGATGGTCAAAACCACCCTAG- 
50 CAGGCAAGAAAGCATGGCTCAGAAACATATGTTCAAGGTCACCCTGCAAGAAGTCGG- 
TAGTAATCGGTTTCACACCCGCATCTAACTTATTCTGGGTCATCTCTACCAGATTAGAG* 
GGGTCCTAGAGGGAAGCGACTGCTCAGCTTCCTTTCCCTAGGGTCCCCATTCAGTG- 
GAGGTCTGGCTCTCACTGACCCATTGTTAGCAAGAGGAACAGGGAGGTGGCCAGGG- 
• GTGGAGGGGCAGCTGTGGTCACTGGCCCAGTGGGAGGGAGCTAGGCCACTAGGAAC- 
55 CGGTCAGGCCAGCACCATCCCTATCCCCATGCTAGCCACCACACCCACCAGCTCTGG- 
CACCTCCCTGCTGCATCGACCACTTAGCTCTGGCAGTATAGGCAGCAGGGCAGGCTG- 
GGGCATGCTGATACCCGCCTCTGTCTGGGAAGTCGAAGGAACAGAACCTGTTCAGGC- 
TGGCGGCrCATTTGGATGAACAGGGAGTGTGTGACCTTGGGCGTTGAGTCCTCTC- 
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CACTC CCTGGGCCTCAGTCTCCCCAACATCAAAGAAGAAGGCAAATCACC n Ml Ml I - 

tttmgagatagggtctcgctctgtaacccaggctacaattgtgactcactacagcc- 
tcttgacctcccagctcaagtgqtcctcccacctcagcctcctgagtagctgagact 
tagqtat agcctcgcaccaccac acccagcta a i m m m m m m m i m m 1 1 m m i- 
5 tttttttgagacggagtcttgctctgtggcccaggctggagtrcagtggcgggatc- 
tcggctcactgcaagctgcgcctcccgggttcacgccattctcccgcctcagcctcc- 
caagtagctgggactacaggcgcccgccactacgcccggctaatttttgtattttag-- 
tagagacggggtttcaccattttagccgggatggtctcgatctcctgacctcatgatc- 
cgcccgcctcg gcctcc caaagtgctgggattacaggcgtgagccaccgcgcccgg- 

1 0 ccacccagctaattt7ttaaaaacattttgtacactttgggaggctaaggcgggag- 
gatcacgaggtcaggagctcgagaccatcctggctaacacaggtgaaacx:ctgtctc> 
tactaaaaaatacaaaaaaattagctgggcgtggtggcgggcgcctgtagtcccagc- 
tactcgggaggctgaggcaggagaatggtgtgaaccagggaggcggagctttcagt*- 
gagccgagatcgcgccactgcactccagcctcggagacagagcgagactccgtcc- 

1 5 caaaaaaaaaaaaaaaaaaaatttgtagagacagatcaagtctgactttgttgctcagg* 
ctggttttgaactcctgggctcaagcaatcctcccgcctcagcctcccaaagtgct- 
gagattacaggcatgagccaccacacctggccaaatcagctattctgaaaggcccctt- 
taatctctatgagccccagactttcaaactgtaaggaccttaggactgtaactaaagt- 
tctacagagcctaaacccctcagctaaagagcctattgttggaaagttctgagtccaa- 

20 gattctatctrtggaacattctagaattctccaatttgtctaacccagaattctgagtct- 
ttctgtaccacattctacctaacc^agggttgcactgctctggaagtctagatggatg- 
gtatagtgcagctggtaaaagcatgagtaagaagtcagacttcaaaaattcaaatct- 

GAGGGCCGGGCATGGTAGCTTCTGCCTGTAATCCTTGCACTTTGGGAGGCCGAGGG- 
GGGAGGATCACTTGAGGCCAGGAGTTCAAQACCAACATGGCCAACACAATGAGACGD- 
25 CATTTCTTAAAAAAAATTAAAATAAAATCATCAAATCTGGCAGCACCACCGTCCAACCC- 

.. . tgaccacagtacctcagtctcgtaatccgtaaaatggggatgaaagttcacctcatag- 
gactactgtaagaatccacctggtcagaaggtgcaggaagaattcagagctctgaga- 
attgaggcctcaggaagaagagactacaggaataaaaactcgggoatttagaatttca- 
gagatacacaaacaatactttgttaactgttaaaatagataaatgagcaagtctgtg- * 
•30 . cagccctaatgccagctgtaagtgact g i i huh icttrrggtagagatfragtctc- 
: tctcgcgcctgtggttaggctggtctcgaactcctagcctcatgggatcctccccgg- 
ctcgatctcccaaagtattgggattacaggcgtgagcacggcgccatgatccccaa- 
atttccaagattctcagattccatactgacattctctggctctcaggaaatgccaaccc- 
tgggtgtggggctgtcgcggggacaggcggtggggacgtcggagccaccaggggg- 
35 cggtcacgcccggacocccgccaggagggcggactgcgcctgagctcaggcccgg* 
ggaatgcgcagcgggcccgggcaggtgctgtacatcccggggcaagggagctggg- 
ccgggcggggtacaagggcggggcgcgggggtggcgcgggccgtgtgtotgttcc- 

CAGGCCTCTGCCCCTGACCTCTGCCTCCGAGTCCTCTCCCATGTGCTCCCCTCTAGC- 
TCTAGCTCCGAGCTCTCCCGCGGGCTCTGGGCCAGCCGCAGGTACTCTCCCCTGGG- 

40 CTCCTC TCTC CGCTCCACCCCTGGCTCTCCTTCCCTGGCCTCCTCTGCACCCCAGC- 
CAGGTTCrrTAGGGCTAAGGATCCTGTGGACTTCCTGGAGGAGTCATCTTCAGTAG- 
GAACCGGGTCAGAGAGCCAGACTGAGCTGGGAACACCCAGGCTGGACTCCTACAGC- 
CCTGTCGGGTCACACTGAATCTGGAGAGGCTCCACTGTCTCTGGGACTCGGTTTCC- 
TCCTTTGTGGACGTCTATGGAATGGGCTAGGGCCTTTCTTGCTCTAAGCCTCTACTT& 

45 GGCTTGTTATTTAGCTTCTCTGTGCCTGTTTCCTCATGTGGACCATGGGAAGAATTA-. 

ATACCTTCGCCTCAAAGGGGTATGAGGATTGAGTGACATAATTTATAAGCCGTGATTA- 
GAACAATGCAGTGCGCGAAATAAAGTTCACACATACAGGATTCATAATTACCAGAT- 
GTCCTTGG CTQTTCATT ATAATAACACAGGGTCTGGCAACAGAGTGAGGGGTCCAGAC- 
TCAATGT AAI M M M I IOCCCTAAMGGGCCCTTTCAACTCTTTCTGAGATCATACAAG- 

50 CCCTGAGTTTTGACACCCAGGGTCXrCAACTTCCTGAGCCCTTGCCTCTCAGAGTCC-'* 

taaatttcccctgtacattcctgagtctggccagtgatcaccctcagtcacttagg- 
gacgggagggctgggagagccctggaagattccagacagaagctggcaaaagcc- 
cagggtgtgggcaatatccactctccagcctccgtttctccactcgtaatgag- 
gagtccttccctggggtcagcaaaccttattcaaagggagacctctcagtcacccaa- 

55 GATTCCTCTAGACAAT GCGA GCTTTCCTACCTACCTACCTACCAGCTCTGAGCTTGG- 
TACACCCAGAGCCCTGTTTTGGCAACCACGGTTATTATrmAATTTCATTT 
TATCATCAAATGCCCTTCAAGCCCAGAGATTGGGAAACACTCCTCTCTCATCAGATGC- 
TCGCCTCCCCCA7TCTGTTTTTAATCCCCCTTCTTAGGACGCATGGGGGTTGAGA- 
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(3aacggggagatagacagaggc3aggtgcctggtcctqccctccccccgcctcaag- 

gacagacagacacctccagaattagcctctgtccctccttatctcccacaataccc- 

caggtcagacagatgggcgtggaggtgacatttctcacctcagggtcagggcaag- 

GAG(XCTGAGGCAGAAGGTTAGTCAGAAAATCTGGCGGGGGCGGATGGAATCC- 

CGTCCCCCAGAGAGCTGCAGAAGAAGGAGGAGGCAGAATCCTGACCCTACAAACTC- 

TACTGCCTGTGTGAGCTCCAAGCCTCAGTTrACCCCTTCCTCTCCGTGTAATGGTTAA- 

ATGCCCGGCTATGCAAACCTCCCAGAATCCAATAGCCGCTTTCCGGAATTCTGCCCT- 

GGGTTCTAGAACTACCTCTGCAAACCCAGCTGTTTCCCACCCCATAAGGCAATAGGG- 

GAGCCCACCTCCGCCAGGGGGTGCCCTAGGGCGGATGTCCCTTCTCTGGTTAGG- 

GAGGTCTGACGCCCAGGTTAATGACATGTTGGGTTCGCTOAGCGGCACAGAGGAG- 

GTTGGAGATCTGCCTCGGTGTTTTC5TGTCCTACCCCGCCCCCATCCCCGAGC- 

CGAAAAGTCGGGGGAGAGCCGGGACAGAGCCTCCGGAGGGACCCCGGGTACCT- 

GTCCTGCTCCACTTCAGGAACCCAGGCTCCACTATCCCTGCCCCACCCTTAATTCTGC- 

TCAGAG ACCTAG AAGATCGGTCG AGACAGCAGCTTGAGGCT GG CAG GGTGGTCACC- 

CA7TCCACCTTGAGCCCCACCAGTCTGAGCCTCTCATTTCTGACCAAGACTCGGGGAT- 

tcgaacccctatactacccaaagactcggcttcctagagccccccagttcgagggac- 

tcaggaattcgagctccaacgtctccccgggatgaaggggtagaatccctccattc* 

caagaattcaggcatccgaacccgctttccttccctccagtaaaacaggcaacggagt- 

ttdcttctaaggatccaggtgtcggcgcgccccaaattccgccctgggacctgg- 

cgtccgagtcccctcccaatcctcccagggacgcgggtgttgggctttttcagggcc- 

tctggtccccaggagggtgaaactcacggatccgggcagatcctggcacctggggg- 

cttcctccagctcgggctccggcttggggagcggagaacggggcggggcaggagc- 

tgggaacaggttagacgacgtgacttgggctggagggaggcgggtcccggtggg- 

gagggggagccaaggtcgcctcgagcaccttgggacttgtagtcccggagggacag- 

gacgtagcccaagacgatcccatttggarrcacccagagtccatttcacagacag- 

gaagggcgaggcccagaagccgagagcgaccaggccagggagatacagaagagc-. 

cgagacgcctgcctcgctgtggctggagactgactcctgagcccttgccccacccct- 

tcaggcgcactatcccctttcctgatcagtatcccccagggtctctgagcccgaatc- 

tccccgtcgataaaaagcgcgggttggatcttcaaaggatgtcccagcaagagtt- 

caaaatcttagtttggactacaacccccagcagcctccgcgaccgccctcgggcgac- 

tctttgcctcgggtcctgtgggaattgtagtcctggagcccgcagggctgcaccc- 

gggtgtctctctcgcccacgcgaaggaaaccgtctggagatcctggataggggaaa- »' 

catttccccttccccttgaccctccctccgctctggaaagcctctcccacctgggga- ' 

gaaggggtgccccaatrctggagtaggatcetaaatcttggcagagggggcgg- 

gaagtggcgctgacacactggccaggaatgcagtcgggtcaccctgtctagccac- 

ogtctcgcggctccaaccgccgcccaacgcggggcggccccagtgggaagg* 

gaagtgggtgcgtccccgaaatctgtgtccacgtgccgctgtttacacgctccctgg- 

ggcagggaggagtcgccgatcaggtccctitcctgaaagtcatcgaggtttcccacg- 

catgagactaaacccccgagggcatctacaagtcccatttgatccacaaacggtacac- 

cgtqcccagcaccactccacgcgtgtggggctcctgggtccgaggctccgccc- 

tcgagaaccacaagctcctccccctatgtttcccgctcccccggagtccagaagccc- 

CGCCCCTGGCTGGMCTTCACGCCCTCCGGACGGATTGCCCCTATTTCTCCATTTTCC- 

CGCTTCTCCCAGTCAAGTTCTGAACTTGTGAGGCATCTGGGCCTCCCCAGAAGACATT- 

TAACACAGAAAGCACAGCCCTACTAACTAGTATTCTTACC3TGTCTCTTCAAGAATTTCA- 

GACCAATCGACCGTCCTGTCTCTTTAAGGCTTAGGAAGAGCAGTGTGGCTGCCCCTT- 

TAAGGAGGCGTTGCAACAAACCATATTGGACAGACGATGGGGGCGACCCATCGG- 

GACCCGACGGGCCTCTGACTCCAGCAATACAGCGAATCAGCGGCTTTCGGGAATA- 

CATT Tj I CG GAAAAAGACTTCTTCCTCGGTTTTCTGCTCTGCAGACGTTQAAATTTTCC- 

CCAGTTTTTCCTGCAGATCGGGAGTCGAGCAATGCCTACCCCCGCGCTCCCGCAC- 

CAGTTGGGCGCTCCCGGATGATGCCCTACCCCT7TGGATCCACGTGGTCTGCAACCT- 

GGTGCGAGCAGCCCGGGCTACAGGGTTGCCTGAGGTGTGGGTCCCAGGATGGAG- 

GAGCCCCAGGCCGGCGGTGAGGGTGCGGGTTGACGGGGTGCGGAGGGTGCGTTG- 

GTGGAAGGAGAAAGGGGGGTCCGAGAGGGTTCGGGCGGAAAAGGAGGCGTACCTG- 

CAAGCAGGACTTGCGAAGAGCGTGCATTCCCAGTGGGCGAACGGGAATTCGAACG- 

GAGAGAGGGTTATCTTGTGGGGGGCTACCCGTGGAGAGCAAGGCGCCCCCAGGG- 

GTTGGATCGGTGAAATTGAGGTCGCCCCTGGGGAACAGGTGGGCAGAAAGGA- 

GAAACCAGGTTGAGGGGACTGGAGTGCTCACGAGGTTAAGACCAATQGACCGA- 

TAGGCGCGCCCTGCAAGATTGGACCGGCAAGGAGGTGTCAGTCGACCCCATTTCCCC- 
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TTCTGCTGCAGATGCTGCTCGGTTCTCTTGTCCCCCCAACTTTACCGCGAAGCCCC- 
CAGCCTCAGAGTCCCCTCGTTTCTCCTTGGAGGCGCTGACGGGTCCAGATACGGAGC- 
TGTGGCTTATTCAGGCCCCTGCAGACTTTGCCCCAGAATGGTGAGTGGTCTTGTT- 
GACGGAAAAGAGGGTCCCGGTCCAGACCCCAAGAGCGGGTTCTTGAATTTGTCACAG- 
5 GAAAGAATTAGAGGTGAGTCACAGAGCACAGTGAAAGAAACAAGTTTATTGGAAAC- 
TACTCCTTTACAGAGTAGAGTGTCCTCAGAAAGCAGGGGGAGAAACCCACAGCCCT- 
TTGTTAGTATTTCTACTTATAAGAAACTATAAGGAACTATAGTTAAACTTGGAGTGTG- 
CAGATAAGCTCACTAAAGGTAGGGGCTATTGGTGTTATCCACGACCATTAATCCTG- 
CAACCTAAGCTTGCTCATTTATG1TATATTTAAGTAATGGGGGCTGCATTCTTAGGA- 

1 0 CATTTGGACATTCTGCAGGCTTGGTGGAACATGTTCTGTATGGCCATAAATATTCTGTA. 
ATTATAATTGGTGGTCAGCCTGGGATGTGGTTATTTTCAGGCCATAAGCATGAACCTT- 
GT AAGTGCCTAGCTACTCACTTTAAGATGG AGTCACTCTAGTCATGTTTTATTAAAAAC- 
CAGAGGCCAGCCAGGCGCAGTGGCTGGTGCCTGTAATCCCATCCTTTGGGAGGC- 
CGAGGCGAGCAGATCACTTGAGGTCAGGAGTTCAAGACCAGCCTGGCCAACATAGT- 

15 GAAATTGTCTCTACTAAAAATACAAAAATTGGCTGGGCGTGGTGGCAGGTGCCTGTA- 
ATCCCAGCTACTTGAGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGTGGA- 
CATTGCAGTGAGCCGAGATCATGCCACTGCACTCCAGCCTAGGCAACAGAGCAAGAC- 
TCTCTCAAAAAAAAACAAAAAAAAAATCAAAAAACCTTCGCTCTCCTGTTCCACTT AAG- 
CCTCTGCCCTCCCTGTTTCTCTCTGTAGCTTCAATGGGCGGCATGTGCCTCTCTCTGG- 

20 CTCCCAGATCGTCAAGGGCAAATTGGCAGGCAAGCGGCACCGCTATCGAGTCCTCAG- 
CAGCTGTCCCCAAGCTGGAGAAGCGACCCTGCTGGCCCCCTCAACGGAGGCAGGAG- 
GTGGACTCACCTGTGCCTCAGCCCCCCAGGGCACCCTAAGGATCCTTGAGGGTCCC- 
CAGCAATCCCTGTCAGGGAGCCCTCTGCAGCCCATCCCAGCAAGTCCCCCACCACA- 
GATCCCTCCTGGCCTGAGGCCTCGGTTCTGTGCCTTTGGGGGCAACCCACCAGTCA- 

25 CAGGGCCTAGGTCAGCCTTGGCCOCCAACCTGCTCACCTCAGGGAAGAAGAAAAAG- 
GAGATGCAGGTGACAGAGGCCCCAGTCACTCAGGAGGCAGTGAATGGGCACGGGGC- 
CCTGGAGGTGGACATGGCTTTGGGGTCGCCAGAAATGGATGTGCGGAAGAAGAAr • 
GAAGAAAAAAAATCAGCAGCTGAAAGAACCAGAGGCAGCAGGGCCTGTGGGGACA- * 
GAGCCCACAGTGGAGACACTGGAGCCTCTGGGAGTGCTGTTCCCGTCCACCACCAA- 

30 GAAGAGGAAGAAGCCCAAAGGGAAAGAAACCTTCGAGCCAGAAGACAAGACAGT- 
GAAGCAGGAACAGATTAACACTGAGCCTCTAGAAGACACAGTCCTGTCCCCGAC- 
CAAAAAGAGAAAGAGGCAAAAGGGGACGGAAGGGATGGAGCCAGAGGAGGGGGT- 
GACAGTTGAGtCTCAGCCACAGGTGAAGGTGGAGCCACTGGAGGAAGCCATCCCTCT- 
GCCCCCTACGAAGAAGAGGAAAAAAGAAAAGGGACAGATGGCAATGATGGAGCCAG- 

35 G GACGG AG GCGATGGAGC CAGTG G AGCCG G AG ATG AAGCCTCTGGAGTCCC C AGG- 
GGGGACCATGGCGCCTCAACAGCCAGAAGGAGCGAAGCCTCAGGC.CCAGGGAGCTC- 
TGGCAGCTCCCAAAAAGAAGACGAAGAAAGAAAAACAGCAAGATGCCACAGTGGAGC- 
CAGAGACAGAGGTGGTGGGGCCTGAGCTGCCGGATGACCTTGAGCCTCAGGCAGO 
TCCCACATCCACCAAGAAGAAGAAGAAGAAGAAAGAGAGAGGTCACACAGTGACT- 

40 GAGCCAATTCAGCCACTAGAGCCTGAACTGCCAGGGGAGGGACAGCCTGAAGCCAG- 
GGCAACTCCGGGATCCACCAAGAAGAGGAAGAAGCAGAGTCAGGAAAGCCGGATGC- 
CAGAGACAGTGCCCCAAGAGGAGATGCCAGGGCCGCCACTGAATTCAGAGTCTGGG- 
GAGGAGGCTCCCACAGGCCGGGACAAGAAGCGGAAGCAGCAGCAGCAGCAGCCT- 
GTGTAGTCTGCCCCCGGGAAACTGAGGAACTAAAGAAAGCTGAAGGTGCCCACCTG- 

45 GGCCACCAGAAGGTGACACCCCCAGAATCCGTCCCCAGAGACTGCACCAGC6CAGCC 

Sequence of the s region of chromosome 19 

The following depicts the region 8 as described above. 
50 More specifically s is bounded by and includes the following two sequences: 
GGCQCCGGCCGGACTGTGCAG and CCAGAGACTGCAC- 

CAGCGCAGCCCAGCtTGAGCAAGATAGCG , and is defined by.SEQ ID NO: 2 
herein below: 
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GGCGCCGGCCGGACT6TGCAGCGGGGTCGACCCGCCTCCCTCATGAAT 
ATTCAGCGAGAGGCCGGGTCGTGGACATCCTCGAGGGCTCGCTCCACCr 
WTTACGAGACCATTGGCTAACCTGCCCGTCAATCCGCTAGGGCAGAGCAATC 
GGGATACTGCGCGTGCGCACGGAAAAGCGAGGGCGGCTGACTCTCGGGT 
5 GAGGCGGTGCGGGAGGCGTCACTGAGGATCGTCGAGGGCC AATCAAAA 
GAAAACATGGAAGGGAAAGAGCCGAGAGACTCGATCTCATTCACTAGAA 
TTTGGTCCTCCTGCGCCTGCCAAGATrGTCTGA 
GTATTGATCGAACCCAGGAGTTCGAGATCAGCTTGAGCAAGATAGCG 

AGAACCCCCGCCCCTCCACCTC6TCTCAAAAAAAAAAAAAAATCGTCTCAGTA6CGAAT 

1 0 AGTCTAACGQAGAATGACAOGGAAATTGGTGATCCTTTCTGGGCCCAAGAGTTAGAA- 

ATGGCTTTGCAGGCCGGGCGCGGTGGCTCAAGCCTGTAATCCCAGCACTTTGG- 

GAGGCTGAGGCAGGTGGATCACCTGAGGTCGGGAGTTCAAGAOCAGCCTGACCAA- 

CATGGAQAAAAGCTGTCTCTACTAAAGATACAAAATTAGCCGGGCGTGCTGGCAAATG- 

CTTGTAATCCCAGCTACTCGGGAGGCTGAAGCAGGAGAATTGCTTGAACCTGGGAGG- 

16 CAGAGGTTGCAGTGAGCAGAGATGGCGCCGTCGCACTCTAGCCTGGGCAACAAAAG- 

CGAAACTCCATTTCAAATATTAATAATAATAACTAATAAATAAAACATAAATGCTAGCTT- 

GGCCCCTGGGAACAGAAAGGTGACCATGACAGCCTCAGCACCTGCCCTCAAAGAACAr 
Q Al i I T iM ' i 'CCTTGAGACAGGGTCrTTCTCTGTCGCCAAGGCTGGAGTGCACTGGCA- 

20 CAGTCACAGCTCACTGCAGCCTCCACCTCTTGGGCTCAAGCGATCCTCCGACCTCA^ 
CTTCCAGAGTAGCTGGGACCACAGGTGTGCACGACCAAGCCCAGCTV^GTm 
TTAAAI I I I I I IAGAGACGAGGTCTCACCACGTTGCCCAGGCTGGTTAAACTCGCAG- 
GTTCAAGTGATCCTCTCCCCTCAGCCTTTCAAATTGTTGGGAT^ 
CACCAGGCCTGGCCTCAAAGAACAGATATTAAATATAGAAATGAATATATGATTAC^ 

25 CTGGAGTGGTGGCTCGTGCCTGTGGTTCCAACACTTTGGAAGGCCAAGGCGAGTA- 
CATTGCTTGAGCTCAGGAGCTAGAGACCAGCCTGGGCAACATGGTGAAAACCCGTC- 
TCTACAAAAAATGCAAAAATTAGCTGGGCGTGGTGGCGTGCACCTGTAGTCCCAGA- 
TACTCAGGAGGCTGAGGTGGGAGAATCACCTGGGCCTGGGAGGCAGAGGTTGCAAT- 
GGGCAGTGATTGTGCCACTGCACTCCAGCCTGGGCAACAGGAGTGAAAACCTATCT- 

30 CAAATGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGCGCACGTGTATAATCACAAGTA- 
CAAAAGTGCTGTGAAGGAAAACTTCAAGTCACCATAAAGATTGATTATGGGCTGGGTG- 
CAGTGGCTCATGCCTGTAATCCCAGCACTTTGGGAGGCCAAGGCAGATGGAT- 
CACGAGGTCAGGAGTTCAAGACCAGCCTGGTCAACATGGTGAAACCCTATCTCTAC- 
TAAAAAAAAAAAAAAAAAAAAAAAAGCCAGGCATAGTGGCATGCATCTGTAATCCCATC- 

35 TACTCGGGAGGCTAAAGCAGGAGAATTGCTTGAACCCAGGAGGCAGAAGTGAGCCAA- 
GATCACGCCACTGCACTCCAGCCTGCGTGACAGAGCAAGACTCCGTCCCAGAAAAA* 
GAAAAAAAAAAAAGACTTATTATGACAGGATGTCT ACTGTCAACTGTGGGGTGTGAGT- 
GTTGGCCAAGTGATCAGAGAAGGCTTCGTGGAAGAAGCGAGGTTTGAGTAGAGCCA- 
GAAAATAATTAGAAGAGATCAACCAGCAAGAGGGGATGGATGAGAGAAGTGAGAAAG- 

40 GTGTTCCAGGGAGAGAGACCATCATACACAAAAGCTCTAGGCCAGAAGAAAGCT- 
GAGGCCTGTGAGTGCTGAAAGGAAGCCTGTGGGGGTGGAGCTCTGAGTTGAGCA- 
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CAGGGAGCAGAGAAAGGGCAGCTGGAGGGGAAGGCAGGGGCAGATCGAAATCTCTT- 
TTrTAAATTAATTAATTCTT^ 

CAGACTGGAGTACAGTGGCACAATCTCAGCGCACCGCAACCTCTGCCACCCAGGCT- 
CMGCMTTCTCTGGCCTCAGCCTCCCTAGTAGCTGGGATTACAGGTGCGCACCAC- 
5 TACTGCCCAGCTAATTTTTATAC1TTTAGTAGAM 

CTGGCCTCAAACTCCTGACCTCAAAAGATCCACCCA 

GGATTACAGGTGTGAGCCACCCTTCCCGGCTGTATTT7TGGAGACAGAGTCTTGCTCT* 
GTCCCAGCCTGGAGTATGGTGGTGTGAATTTGGCTCATTGCCACCTTGACCTCCAG- 
GGCTCAAGTGATCCTCCCACCTCAGCCTCCTGAGTAGCTGGGACTGCGGGTACACGA- 
10 CACCACGCCTGGTTA AI M I NI I A ATTTTTTGTAGAG ACG AGGGTATCTCACTATGTT* 
GTCCAGGCTGGTTGAACTCCTGA6CTCAAGCAATTCTCCCACCTCAGCCTCC- 
CAAAGTGGTGGGATTACAGACGTGAGCCACTCTGCCCGGCTrAATTTATlTACATAA- 

ATTTTTTTATGTITACTTTTCTATCTCCT^^ 
. GGTCTCGCTATGTTGCCCAGGCTGGTATTGGGCTCAAGCCATCCTGTTCCCTCAGCC- 
1 5 TCCCAAAGTACTGGGATTACAAGCGTGAGCCTCTGCATCCAGCCCAGATCCAAAATCT- 
TTACTGTCACCTACAGAGTCCTCTGTAACTAGCTTACTGCTCATCATCCCCATACCAAC- 

CCACCTTACTGCTCTGATCTCCTC^ v 

CTGGTCTCCTTGCTGTCTCTAAAACATAACAAGCACATCCCATCTCAGGGGCTTTG- 

CACCAGCTATTTTGTCTGCCTGGAATGCTGTTTCCCCTGATAGCCATGTGGCTGACA- 

20 CACTCACCTCCCTCAGCTCTTTGCTCAATTGTCAACTTCTCGGCCCGGCATGGTGGCT- 
CACACCTGTAATCCTACCACTTTGGGAGGCTGAGGTGGGCAGATCACCTGAGATCAG- 
GAGTTCGAGACCAGCCTGGCCAAGATGGTGAAATCCCGTCTCTACTAAAAATACAAAA- . 
ATTGGCAAAGCATGGTAGCACATACCAGTAATCCTAGCTACCCGGGAGGCTGAGG- 
CAGGAGAATTGCTGGAACCCGGGAGGCAGAGGCTGCAGTGAGCCAAGATCATGC- 

25 CACTGTACTCCAGCCTGGGTGACAAAGCAAGACTCTGTCTCAAAAAAAAAAAAGTCTC- 
CTTCTCAATGAGGGCTTCCTGACCACCAAATTAAATCTACCTCCTAGACACACACACA- 
CACGCACGCACGCACGCACACACACACACGCACGCACGCACACACACACACACACA- 
CACACTATATCCCCTTTCCCTGCTTTATTGTTCTTGAGAGCTCATTTAACCA 
CATGCTGAATATTTTACTTATTT A 'TTT I G I H A QAAAGCTCCTGGCTGGGCGCGGGGGC- 

30 TCACGCCTGTAATCCCAGCACTTTGGGAGGCTGGAACAGGTGGATCATGTGAGGT* 
CAGGAGTTCCAGACCAGCCTGACCAACACGGTGAAACCTCATCTCTATTAAAAATG- 
CAAAAATTAGCTGGGTGTGGTGTCGCATGCCTGTAATCCCAACTACTCAGAAGGCT- 
GAAGCAGGAGAATCGCTTGAACCTGGGAGGCAGAGGTTAACGCTGAGCCGAGATCG- 
CGCCATTGCACTCCAGCCTGGGCAACAAGAGTGAAACTCTGTCTCGAAAAAAA- 

35 CAAAAGTCAGCTCCATGGCAGGAGTGATGGCTCACGCCTATAATCCCAGCACTTTGT- 
GAGGCCGAGGCGGGCGGATCACTTGAGGTCAGGAGTTGGAGACCAGCCTGGCCAA- 
CATGGTGAAACCTCATCTCTACTAAAAATACAAAAATTAGCCGGGCGTGGTGACACAT- 
GTCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCTGGAGAATGGCTTGAACCTGG- 
GAGGTAGAGGTTGCAGTAAGCCAAGATCGCGCCATTGCTCTCCATCCTGGGCAACA- 
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GACTCCGTCTCAGAAAGGAAGAAAGAAGGAAAGAGAGAAAGAGAGAAAGAGACAGA- 
GAGAGAGAGAGAAAGGGAGAAAGAGAGAAAGGATGGAAGGAOOTGACAAGCACT- 
GTTGCATAAAAG I IICU I ICTCTCTCI 1 1 1 1 1 1 1 M J I I II I 1 1 1 1 1 1 GAGACAGGGTC- 
TCACTTCrTGTTGCTCCAGCTGAAGTGCAGTGGTGAGAACATGGCTCAGTGCAGCCT- 
5 CAACTTCC^AGGCTrAAGTGATCCTGCCACCTC^GCCTCCTGAGTAGCTGGGACTG. 

TAGGTGTGCACCACCGTGCCTAGCTA Al MM I GT AI I \ \ I A GTAGAGACATGGTTCCG- 
C^ACGTTGCCCAGGCTGGTCTTGAACTCCTGGGCTTAAGGGATCTGCCCGGCATGGC- 
CTCCCAAAGTGCTGGGATTACCAGCGTGAGCCACTGTACCCAGCCTGAGTATAGGTT- 
TCTGATAMTTTTAGGATCATATTGTTTGGACTGGGTAAGAATTTCCAGAAGTCTaAT- 

10 GAAGAAACTGACTGGTTTATATTTTAT^ 

TCACTCTnTGTTGCCCAAGCTGGATTGCAGTGGCACGATCnTGGCTCACCACAACCTC- 
CGCCTCCCGGTTTCAAGTGATTCTCCTGCCTCAGCCTCCCCAGGAGCTGGGATTA- 
CAGGCACCCACGACCATGOTCGGCT AH rTTTTTTTTATTTTTTTATTTI I A GTAGA- 
GACGGGGTTTCACCATGTTGGCCAGGCTGGTCTCGAACTCCTGACCTCAGGTGATC- 

1 5 CACCTGCCTTGGCCTCCCAAAGCGCTGGGATTACAGGCATGAGCXJACTGTGCAAGGC- 
CTAGGCTGGTTTATAAAATTGCTAAACCAAGCAGAACATGAATTAAATACCAAGGAA- 
ATACTCTCCTAGATTGTCATGTTACATCAGCCAATACTAAAATTGTCAAGATACACAAT- 
TTGAATGAACTCCATGGTCCAAGTCGAATTATCTATGATATTACCCATCTAATAAACAG- 
CACTATGTCCCTTAATGGGAGAAAAAGTTGGAGAATTTAAGAGAATATCAATCCAAT- 

20 GTTGGTTGGGTGCAGTGAATCATGTCTATATTCCCAGCACTTTGGGAGGCCAAGG. , . 

CAGGAGGATCACTTGAGCCCAGGAATTCAAGGCCAGCCTCGGCAACAGGGTGAGATC- 
CTGTCTCTACGGAAAATTAAAAAAAAAAAAAGAGAGAGA7TAGTGGGATGTGGTGCC* 
TATAGTCCCAGCTACTTGGGAGGCTGAGGCGGGAGGATCATTTAAGCCTGGGACGTT- 
GAGGTTGCAGTGAACCATGAGTGAGACTCATCTCAAAAAAAAAAAAAAAATGGCGAT" 

25 cactagaggaaaaaaaaactaaagtggggtttgcgggtagtgggagggcccttcctg- 
ctaggttgcactatgatctccagggaggctccacgggagaatcatttccttgtctttt* 
tcagtttctagagccaaattctttgcataccttgcattccttg 
ctaaccttcaaagctggcagctagcctctggctcaagtgtcacatg 
gtcttcctatccaatcttcctcttataagaacattggagccaggcatggtggctgacg- 

30 cctgtaatccgagcactttgggagaccgaggcaggcggatcacaaggtcaggagt- 
tcgagaccagcctggccaacacagtgaaaccccgtctctactaaaaaaata- 
caaaaaagtagccgggcatggtggcaggtgcctg™^ 
gaggcaggagaatcgcttgaacctgggaggcagagcttgcagtgagccgagatagt- 
gccaatgcagtccggcctgggcgaaacagcgagactccgtcgcaaaaaaaaaaaa- 

35 ataataataaataataaataaaaataa^ 

ataaaaattattttgagacaaagtctattctgtggcagaggctggaatgcagtggcgt- 
gatc^cagcttactgcagcttctacctcct^ 

tcctgagtagctgggacctcaggtgtacattaccacgctcagctaattatttatttam 
tattatatttttgtgacggagtttcgctcttgttgcccgggctogagtgcaatggtg^ 
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TATCTCAGCTCACTGCAACCTCTGCCTCCTGGATTCCAQTGATTCTCCTGTCTCAGCT- 
TCCTGAGTAGCTGGGATTACAGGTACATGCCATCACGCGCAGCTAATTTTTGTATTTT- 
TAGTAGAGACGGGGTTTCATCATATTGGTCAGGCTGGTCTCGAACTCCTGACCTCAG- 
GTGATCCACCTGCCrTTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGGCACCACG- 
5 CCCGGCA AI 1 11 U III id rTTTTTTTTI I CAGACAG AGTCTTGCTCTGTCAGCCAGGC- 
TGGAGTGCAGTAGCGTGATCTCGGTTTACTGCAACCTCCATCTCCCGGGTTCAAG. 
CGATTCTCCTTTCTCAGCCTCCCAAGTAGCTGGGACTACAGGTGCACACCACCACGG- 
CGGGCTAATTTTTGTATTTTTAGTA^ 

TCAAACTCCTGACCTCAGGTGATCCATCTGCCTCAGCCTCCCAAATTGCTGGGATTA- 

1 0 CAAGCGTGAGCCACACACCTGGCTTMTTTTTTTATTTTTGA 

* TATGTTGTCCAAGCTGGCAGAGATTTTTGTTTG I FTGTI I GAGAGGGAATTTTGCTCTT- 
GTAGCCCAGGCTGGAGTACAATGGTGCAATCTTGGCTCACCACAACTTCCGCCTCC- 
CGGGTTTAACAGATTCTCCTGCCTCAGCCTCCCAAGTAGCTGGAACTACAGGCACC- 
TACCACCACACCAGGCTAATTTTTGTGCTTmAGTAGAGA^ 

15 GGCCAGGGTGGTCTTAAACTCCTGGCCTCCAGTGATCCACCCGCCTTGACCTCC- 

CAAAGTGCTGAAATTACAGGCGTGAGCACCGCGCCTGGCCTCTCAACCTACAATTT- 
CAACACCCAAGGAAACAGCCCACCATGAGTGAGAACCAGCAGACACAACAAACTA- 
TAGGATTAGCTGCCTCCAAACTTCAGGTGATAGATTATCAGGCATGTACTTGAAAC- 
TAAAGGACACAAAAGAAGMTCCGAAATATAAAATAAAGGATTGGACTTGTGTGAAAA- • 

20 GAATCCCTTAGAAAGGGCTAClTrCAGGCTGGCCATGGTGGCTAATGGCCTGTAATC- 
CCAGCACTTTGGAAGGCCGAGGTGTGTGGATCACCTGAGGTCAAGAGTTCAAGAC- 
CAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTGAAAATACAAAAATTAGCGAGGT- 
GGGGTGGCAGATGCCTGTAATCCGAGCTACTCGGGAGGCTGAGGCAGGAGAATCGC- 
TTGAACTCAGGAGGCAGAGGTTGCAGTGAGCTGAGATTGCGCTATCGTGCCCCAGCC- 

25 TGGGCACTAGAGTGAGATCAAAAAAAAAAAAAAAAAAAGAAGAAGAAGAAGAAAGGGC- 
TACTTTCAGACTGCCTTGCCAAAAATCATAACCACAATGATGAGCATGTATTGAGT- 
CAAAACAGAATCAAAAGAGAAGAAAGTCAATTTCTGTGCAAACTACTTTTATTTATAAG- 
GAAAGTTTCTCT ATTTTGTTTATAAACATTAAACCAGTGCTGTGTGAAGGCACTTAATT- 
GGGGAGAGGTGGGGCAGGGATCCTGGTAGAGACCAATGTTTCCCACCCAGACCC- 

30 CAAGACTGCTGGGAGAGATGGTGTCAGCAGTGACTCCCAGGAATATCCAGTGGTGTG- 
GTGGCCCATCCCAGGCCCGGCTGGGCAGGTGGCTGGCTTGCTGGGGGATGTGAT- 
GATGGTGGTAGGCATGGGAGGCACTTTGGACGGGATCTGATTTGGCAAAAGGAAGTG- 
GTTTCCTGTCCCCAGTGATTTGCAGCCCTTCCCAGACCTCCCAAGGCTAAGGCAGAT- 
TACTAAATTTAAGGCTGGGGCCCTCCTTCTTCCCTGGACTTCCAGGAGAACAGAGAAC- 
' 35 CGGTGGCAAGGACCACCACCAGCAG6GTGAGGGGTGCAGATAAAGGCAGCAAAAAA- 
CAGAGGGAGAGGTCTGGAGGGAAGGCAGGAATGCTTGTTTCTGTCAGCCTCAGAAAC- 
CTCCTTCTATCCTGCTAGACTTTACTCCTTTGAGGCTTCACCCTGGGGAACAGCTG 
GAGAGACAGGATCTTCAGACATCAGGAGCTCCCACCTCCTCATCCCACATGCAAATC- 
CGCTGCCTGTCTCTATCCTCCCACCCCTTCCTAAGGGGACCTCTCAGCACCTCC- 
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CAAACTGCTCCAGAATCCAAGTTCTC^GTCACCTCCAAGAACCAGATGGAACCTTCCA- 
ATCAGAGCCTCCACTGATGAAATGGAATA^ 

GAAGCCCACCTCTCTCTAACACCTTGGnrTGT C 1 1 1 1 1 G GGTCCCACCTCCATATT- 
TAAAAAATCTCCTCTCTCAGGGCCGGGAGCAGT6GGTCACACCTATAATCCCAGCAGT- 
5 TTGGGAGGCCGAGGTGGGTGGATGACCTGAGCTCAGGAGTTCAAGACAAGCCTGGT- 
CAACATGACGAGACCCTGTCTCTACTAAAAACACAAAAAATTAGCTGGGCGTGGTGGT- 
GCATGCCCGTAATCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATCACTTGAATCCG- 
GGAGGTGGAGGCTGCAGTGAGCCAAGATCGCGCCACTGCACTCCAGCCTGGG- 
CGACGCAGCTGAAGCTGTGTCTCCAAAAACAAAACACACACACACACACACACA- 

10 GAAAAAAAAAACCAAAATAAAAAAATCTCCCTTCTCAGGAATGTAACGGAATOTTCCTT- 
GCCTrCTCCCCTAACCCTAATAGAGAATTTTCCTCAGTTACACTGTA^ 
G AI J I I I CCTCATTCTGCCCAATGCAGTGTAATGAAAGCTTCCTCTCCATCTGTTATAT^ 
TATATATAAATATATATTATATATTTATATATTATATATTTATATA^ 
TATTGTCACCCAGGCTGGACTGCAGTGGCACCATCAGGGCTCACTGCAGGATCAATC- 

15 TCCCAGGCTTAAGCGATTCTCCTGTGTCAGCCT 

CACCCGCCACCACACCCGGCTAA Cn I H H I I I I QT A1 i ' I TT A GTAGAGATGGAGTTT- 
CACCATGTTGGCCAGGCTGGTCTAGAACTCCTGACCTCAGGAGATCCGCCCGCCTT- 
GGCCTCCCAAAGTGCTGGGATTACAGGTGTGAGCCACCTGGCCGGGCCCTCCACT- 
TCCTTCTTOTACATTGCTGAATCCCTGTGTCAGCGCTAGAGGTCCAGTCTTT^ 

20 TCCCAGCCTTAATCTACAATTCTGTAACCCACCCACCATCATTAAAATGAGATTCrrTCT-' 
TTGTC^TTCCCTTGGCTAAAATGGATTATTCTTTAACCTCTCCACCAATACAACCAGG- 
GATGATAATAAAAACATTGGATTGAGCAGAAACCAATCAAATAACTAGTAAGGCAGTAC- 
TGGCGAGCACCCTACATCCTGACAGCTTTATAAAGGGCGCTTCCAGCCAGGTGCGGT- 
GGCACATGCCTGTAATCCCAGGAC7TTGGGAGGCTGAGGCGGGCAGGTCACCTGAG- 

25 GTCAGGAGTTCAAGACCAGCCTGGCCAACGTGATGAAACCCTGTCTACAGAAAATA- 
CAAAAAAAAAAAAAAAATTAGCCGTGCGTGGTGGCATGCGCCTGTCATCCCAGCTAC- 
TCTGGAGGCCMGGAGGGAGGATCACTTGAGCCCGGGAGGCAGAGGTTGCAGTGAG- 
CCCACATCTTATCACTGCACTCCAGTCTGGGTGACAAAGCAAGACTCCATCTCAA. 
ATAAATAAATACAAATTGGCCGGGTGCGGTGGCTCATGCCTGTAATCCCAGCACTTTG- 

30 . GGAGACCAAGGCAGGTGGATCATTTGAGGTCAGTAGATCAAAACCAGCCTGGCCAA- 
CATGGTGAAACCCOGTCTCTACTAAAAATACAAAAAGTAGCCGGGCGTGQTGGTGGT- 
GGGCGCCTGTAATCCCAGGCAGGAGAACTGGTTGAGCCCGGGTGGGGGGGGCC- 
CGAGGTTGCAGTGAGCACAGATGGCGCCATTGCACTCCAGCCTGGGCGACAGAG- 
CGAGACTCCGTTTCAGAAATAAATAAATAAAATAAAAATAAAAATAAAAAAATAATAGAA- 

35 ATTTAAAAATAAAATAAAGGGCTTTTCCTCACCTACTCCACTAACnATAAGGGACCCT- 
TACCCCCGACATTACTATTAAATAT^ 

ATGAGC 1 1 1 I CAGACCTCCCTCTCCCAATATAACGGTTTGTTCCTGTTGCCTCTTCTTT- 
TTCCTGTGGGATCCCCCTTTTCCCCAACCCCCAACTGTCGGGAGGTCCCCATGACTTC- 
TCCCCTGGGCTCACCCCGAAGTAGTTCCGCGGCACGTAGCCCTCCTGGCCGTGCAG- 
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CGCQOCCCACCACCAGTCGGTCTCCTCCGGCCCGTCCCTCCGCAGCACQQTGAC- 

CGACTCGCCCTCGCGGAAGGACAGCTCGTCCCCGAACTCGGCGCTGTAGTCCCAGA- 

GAGCGTACACTGCCCCGCTGTTCATCAGCCCCATACTCTGCTCGACGTCTGAAACAT- 

GCCACGGAGGGGAAGGTGAGAGCCTGGCCCAGGGGGTCCAGGAACAGGGGC- 

CACGTGGGGTCCAGGACAGACCCTGGAATTTGGCGCCTGTCCCAGCAACCACCTGAA- 

ATGTTGTGTGTGCCCATGGCTGTGGATGGGAACCGGAGCTGGAGTCAGATGCCG&- 

GACTGGCCGTCTTTGAGCGTTCGAGGAAACTGGGGGAGGCATGCCAGTGGGCCACC- 

CACTCCCGAGGCAGGGTCAGAGGCTCCCATTTCTTTTCTTT C I rTTTTTTTTTTTTTTGA" 

GACAGAGTCTCGCTCTGTCGCCCAGGCTGGAGTGCAGTGGCACGATCTCGGCTCAC- 

TGCAACCTCCGCCTCCCGGGTTCACACCATTCTCCTGCCTCAGCCTCCCGAGTAGCT- 

GGGACTACAGGCGCCCGCCACCACGCCTGGCTAATTTTTGGTATTTTTAGTAGAGT- 

CAGGGTTTCACCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTTGTGATCCGCC- 

CACATTGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCTTT- 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 rGAGATGGAATTTCGCTCTTGTCGCCCAGGCAGGAGTGCA- 

ATGGTGCGGTCTCACTGCAACCTCCGCCTCCGGAGTTCGAGCCATTCTCCTGCCT- 

CAGCCTTCCAAGTAGCTGGGATTACAGGTGTGCGCCACCATGCCTGGCGAATTTTTG- 

TATCTTTAGTAGAGACGGGGTTTCACCATGTTGGTCAGGCTGGTATCAAACTCCTGAC- 

CTCAAGTGATCCACCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGC-. 

CACCTGGCCCGGCCCTCATTTCCTTCrTGTACATTGCTGAATGCCCGTGTCAACCCTA- 

GAGGTCCAGTCTTTTGCCCTACCCTGGCGCTTAGCTTAAGTGGTACAGTCTCTAAG. 

GAAGATTCGCaCCTTCCTTGAATGATAGGGTC 

TrTTCTTTTCTTtTCTTTTCTTTITGG^ 

GAGTGCAGTGGCGCGATTTCGGCTCACTGCAACCTCCGCCTCCTGGG7TCCAGCAAT- 

TCTCCTGCCTCAGCCTCCAAAGTAGCTGGGACTACAGGCCCACGCCGCTACACCCGG- 

CTAAATTGTTITATATTTTTAATAGAGACGGGGTTTCACCGTGT^ 

GGAAATCCTGAGCTCATGCAATCCGCCCGCCTCGAGCCTCCCAAAGTGCTAGGATTA- 

CAGGCATGAGCCACCGCGCCTGGC 1 U C 1 11 TT CTTTT C IIHCIHIHiMU C AGA- 

CAAGGTCTGACTCTGCCACCCAGGCTGCGGGAGTGCAGTGGTGAGATCAAGCTTACT- 

GCAGCCTCGAACTTCCAGATTCAAGCAATCCTCCTGCCTCAGCCTCCTCCTGATTCTT- 

TATGTTATTATTAAATATTTTGTAGGCCGGGCACAGTOGCTCACACCTATAATCACAG- 

CACTTTGGGAGGCCAAGGCAGGCGGATCCTCTGAGGTCAGGGGTTTGAGACCAGCC- 

TGGCCAACATGGCAAAACCCCGTCTCTACTAAAAATACAAAAAAAAAAAAAAAAAAAGT- 

TAGCGGGCCGTGGGGCCCTTGCCTGTAATCCCAGTTACTCGGGAGCCTGAGGCAG- 

GAGAATCGCTTTCACCGAGGAGGCAGAGGTTGTAGTGGGCTATGGTGCCATTGCAC- 

TCCAGCCTGGGTGACAGAGCAAGACTCTGTCTCAAAAAATAAATAAATAAAAATAA- 

ATAAATATTTCGTAGAGGTCAGGTGTGGTGGCTdACACCTGAATCTTAGCACTTTGG- 

GAGGCCAAGGTGGGCAGATTGCCTGAGCTCAAGAGTTCGGGACCAGCCTGGGCAA- 

CACTGCAAAACCCCTTCTGTACTAAAAATACAAAAAAATGAGTCGGGCATGGTGGT- 

GAGCACCTGTAGTCCCAGCTACTCAAGAGGCTGAGGCAGAGAATTGCTTGAATCCAG- 
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GAGGTGOAGGTTGCAGTGAGCCGAGATTQAGCCACTGCACTCCAGCCTGGGTGAr 
CAGTGAGACTCTGTCn"CAAAAATAATAATAAATAAATATTTGTAGAGACAGGGGGTCTC- 
TACAATGTCTTGTAGCCTGACCAGGCTCACCnTrCAAATATATAACCCTCTGTCTCACC- 
CATAAGTCCTAGGACCTGCCTCACTCCAACTCTCCGTGAAGTTCCTTGCCCACACCGA- 

5 GATACAACTGGCTGCTCCAGGTGTGAAATGACCCTGTGCACAATCCCCGTGGCACAG- 
CCTACTTCGCCCTGCCCGTCGGGGAACCAGGTGATGTAGCCTGCCCCCTGGAGAGA- 
TAGGGTACAGCCTTGTGTCTTCCTACAAGCCCCTTTCTGGCAGCTGTAGCCTGCTCAC- 
CTOCCAGTGGTGTGGCAATGCCTCTCCCACAAGTGGCAGAGCCCACCTGCCCAGAG- 
CCCTATGCCAGGTAGATGGCAGGGTTGAAACGTTCAGCTCCTCACCCTTGAAGATGT- 

10 GAAAGGTGAGCAGACCAATCTTCACAGCCACTCTCCTCCCCAAAGGTGTCCAGCTCG- 
CATAGCACAGCCTCCATGTCCCCT7TTCCCTTAGGAGGGCATAGTCCCCCCACCCO 
CGCAAGCGGTCCATCCCTCATCCTCCTCCTCGGCAATCCTGCCAAGTGGTTGGTAr 
CAGCCCCCATACCCTTCTCTCCCTAGTAGGGGGTAGTTGCTCCCCTCCCCGCTCCTG- 
CGCAGCCGCCAGGTACCCAGGCGCCAGCAGCCCTGCCTCGCACCTGCCAGGTAGGT- 

15 GGCGCAGTCAGCATAACCCTCGCGGTAAGGGTCGCACTTCTCGAAGGCGGTGGCGC- 
CGTCGCTGAGCGTGGTGGCGAAGATTGCAGCGQCGTGCTGCACCAGCGCCATGCA- 
GATGACTGTGTCGTTGCACGACGCCGCGCAGTGCAAGGGTGTCCTAGGCGTGGGG- 

" * gtggg6ggttgcggggaacgatgcgtgagaggct6cgcgtccgcccacgggggac- 
* ccascccaccgcgcgggtcggggctcaccagccgtggctgtcggggfeagttga- - 
20 cattggcacccgcggtgatgaggaaatccacgatagagtagttggcgccgcagat- 
f ggcgttgtgcaaggcagtgatgccctcctcgttgggctggctcgggtcgttcatct- 
gagtgcaccgggggagggggaagactcagtcccgcggctggcatgtgcgatgccc- 
ccgccgtgcccacctcccgctcagcagcgctcacctccttcaccgcotgctgcac- 
cacctccagctccccggtcagcgccgcgtccaggaggagcaccagagggttgagg- 
25 cgcgcgcggcgggccttgcgcggggagcccgccttccgcagcacagagcgcatc- 
tcctgggggacagggcgcagaggtcagcgacttggagggattgttagtatatccat- 
gatctagagtaggaaacagaggtccagggacttgtggcacccatctagacaggggta- 
gaactgggattccctcgggatggggtgagggggtgccttcgatctcctcctagagcc- 
tcgagttccctgccatagacagggaatcctgtgatttgagaatcttgggccctgaaac- 

30 TTGGGAGAAAGCTGGGGGGCCATGGGATTGGTGGCAAAGTAATTCTATCAGTT- 

CAAAACAATGATTGTGGAAGCCAGTTATGCAATTCACACACAGTCTCACATTTCTTTT* 
GTTAATAATGAATGCAATGAGACACACATGACAAAATGTTACCAGGAGTGTTCATTC- 
CGGATGTTTGGAATTTGAGCATTTTATTAT^ 

rrrTTTTTTTTn IG AGATGGAGTCTCGCTCTGTCACCCAGGCTGGAGTGCAGTG- 
35 CAGTGGTGTGATCTCAGCTCACTGCACCCTCCATCCCfcCAGGTTCAAGCAATTCTCCT- 
GCCTCAGCCTCCTGAGTAGCTAGGATTACAGGCATGC 

TCATATTTTTAGTAGAGACAGGGTTTTGTCATGTTGTCCAGGCTGGTCTCGA^ 
GACCTCAGGTGATCCACCCACCTCAGCCTCCCAAAGTGCTAGGATTACAGGTGrTGAG- 
CCACTGTGCCCAGCCTCATGGGCm 
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TATTCTGQGCTGGeCGAQQTGGCTCATGTCTGTAATCCTAGCACTTTGGGAQQCT- 
GAGGTGGGAGGATCACTTGAGCCCAGGAGTTCGAGAACAGCTTGGGCAATATAGTGA- 
GACCCAGTCTCTACAAAAAATAAAAAATTAGCCTGACATGGTGGCGCACACC- 
CGTCGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGGATTACTTGAATGGAAGAGAAG- 
5 GAGGCTTCAGTGAGCCATGATCATGCCACTGCACTCTAGCCTGGGCAACAGAGTGA- 
GACCCAGTCTCAAAAGAAAAAAAAATGCATTTATTTATTCCAAGTGTGTGAGTGCATAG- 
CATTTGTGATTCTGGTCTTTGCTGTTTCCAGAGTTTCAGTGATTT^ 
CAGAGATCCCAACAGCCACTGAATTCAAAATTCCCAGATGCTCAGTTATTTCAAGTTTC- 
CAATATGTTGTGATTGCAGAAATGCTAGGCTGTGCTATTTCAAATTGCTGAGGGGC- 
10 CAGGACTTTGGAATCCAAAGATTCTATGATGGAGAACTTTAATArrrrr 

t ctttttn i gttgg i i ii m igagacaqaqtctcgctctgtcgcccaggctggagtg- 
cagtggtgcgatctcagctcactgcaagctccgcctcccgggttcaggccattcnrcc- 
tgcctcagcctgccaagtagctgggactacgggcgcccgccaccacgcctggctatt- 
ttgt ai i i 1 1 a gtaaagatggggtttcaccgtg'ttagccaggaaggtcttgttctcct" 

15 gacctcgtgatccgccc^cctcggcctcccaaagtgctgggattacaggtgtgagc- 
catcatgcctgacctagaatttcattttaaaagactagaaggaaatggctgggtgcg- 
gtggctcatgtgtgtaatctc^gcactttgggaggctgaggagagtggatcacct- . 
.. , gaggtcaggcaggagttcaagaccagcctggccaacgtggtgaaaccctgtctctac- - 

taaaaatacaaaaattaggtggccgtggtggtgcacgcctgtaatcccagctactcag- 

20 gaggccgtggcatgagaatcacttgaacccaggaggcacagttatagtgagctga- . 
gatggcaccatcgcactccagcctgggtgacagagtgagactccatcicaaaaaag- i 
gaaaaaaaaaagaaagactagaaggaaatattcaaaatgttaatgatggt^^ 
GAGTGGTGTGATTTTGTCCTCTTTCTTCTA1 1 1 I iatttattttccccaagctctctatg- 
gtgttggtgtatttctctatagtggaat^ 

25 cacagtggctcatgcctggtttgagaccagcctggacaacataatgagaactgtctc- 
tactgaaaatgttaaatattatctgggagtggtggtgcatgcctgtagtcccagc- 
cataggggaggctgaggcatgaggatcaattgagcccagtaggtggaggctgcagt- 
gagccatgatcttgccactgcactccagcctgggcaacagagtgagactctgtc- 
tcgataataataaccctctattacaacatatcagtgcatgaatttgtgatttt 

30 caaaatatgagcatctttmttgtcagatttggtg 

cagtctatgatactaactttataattai rrrm i aagagaagagtttccttttattt- 
tattttatttgagacagagtttctctctgttgcccaggctggagtgcagtggcgca- 
atctcggctcactgcagcctctgtctcctaggttcaagcaattctcctgcctgagcc- 
tcccqagtagctgggattacaggcatgcaccaccaggcccagctaatttttgtatttt- 

35 tagcagagacggggtttcaccatgttggcgaggctagtcttgaactcctgacct- 

caagtgatccacccgcctcggcctcccaaggtgctgggattacaggcatgagccac- 
cgtgcccagcctaactttataattctaagatcgtgttc 
ctctaaaatgttactatcctaagac^ 

tttttaagtttctctgtggccaggactctgtgattct acaatgggatgctcagccattt- 
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CAACATGTTGTTATTCATCCCCTCTTGATTTCAAAATCCTGAGCCTCAAGGTTCCTTGC- 
CTTTACTTTCAGGAGGGCCTAGGAATAGGCATTTTGGGGGGGTCCACCTGACCCCTG- 
CTTCTCTGAGAAGTGATCTCTTCCCGCTGTCTACGCACACGGAGTGTTCAGGACTGT- 
TCCATGTGGCTACAACCCTCTTCCCAGTCAAGATGCAGGGACCAAGATCAGCAGGA- 
5 GACCATCCCCTGGTCCAATGGTGACAACAGTAAGAGCAGTTAACAGTTATGTGCCAGG- 
TATTATGCTAAGCACTACATTAATGTATTTAATCTTGGCGGGGTGTGGTGGCTCACACC- 
TGTAATCCCAGCACTTTGGGAGGCCAGGGCGGGCAGATCACTTGAGGTCAGGAGTT- 
CAAGACCAGCCTAGCCAACACAGTGAAACCCCATCTCTACTAAAAATACAAAAATTAG- 
CCAAGCGTGGTG6CATATGCCTGTAATCCCAGCCACTTGGGAGACTGACGCAGGAGA- 

10 ATCACHTTAACCCAGGAGGTGGAGTCCAGCACCCAGCCGAG 

TATTTATTTATTTATTTTTA I i 1 1 1 A TT I IIH I G AQACGGAATCTTQCTCTGTCACC* 
CAGGCTGGAGTGCAGTGGCGCGATCTCAGCTCACCACAAGCTCCGCCTCCCGGGCT- 
CACGCCATTCTCCTCTCAGCCTCCAGAGTAGCTGGGACTACAGGCGCCCGCCACCAC- 
CCCCAGCTAATTTTTGTATTrTTAGTAGAGACGGGGTTTCACCGT 

1 5 GTCTTATCTCCTGACTTCGTGATCGGCCCGCCTCGGCCTCCGAAAATGCTGGGATTA- 
CAGGCATGAACCACCACGCCCGGCCTATrTATTTATrrATrTAGAGAT<MAGTCTTG& 
TCTGTCGCCCAGGCTGGAGTGCAGTGGTGCAGTCTTGGCTCACTGCAACCTCCGCCT- 

tccgggtttaagcgattctcttgcctcagcctcctgagtag 
gaccaccacttctcctgrrrgtccttcccagcttctcccccacctccc crttl ' c cct 

20 ttataagacaggaaaaaaagggagaaagcaaaacgctggaaaaaaacagaagtacga* 
taaatagctagatgaccttggcgcgaccatctggtcctggtggttaaaataataata- 
: ataatattaatccctgaccaaaactactggtgttatctgtaaattccagacattgtat- 
gagaaagcactgtaaaacgttttgttctgttagctgatgtctgtagcccccagt- 
cacgtrcctcacgcttacttgatctatcgtggccgtttcacgtggaccccttagcgtt- 

25 gtaagcccttaaaagtgctaggaatttcttntcggggagctcgggtgttaagacgct- 
gatgctcccggccgaataaaaacctcttccttctttaatccggtgtctgaggagtttt- 

GTCTGTGGCTCGTCCTGCTACAGMTTACAGGCACGCGCCACCGCTCCGGGCTAATT- 

TTTGTATnTTTTAGTAGACAGGGGGTTrCACCATGTTGGTCAGGCTG^ 

TCTGACCTCATGATCCACCGACCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGT- 

30 GAGCCACCGCGCCCGGCCGAGACTCACTATTTTATAAGAGGAGAGAGCAAAGCCAG- 
GAACAGTGGCTCATGCCTCTAACTGCAGCAATTTGGGAGGCTGAGGCAGGTGGAT- 
CATTTGAAGTCAGGAGTTTGAGACCAGCCTGGCCAGCATGGTGAAACCTCATCTCTAC- 
TAAAAATACAAAAATTAGCCAGGAGTGGTGGCATACACTTATAATCCCAGCTACTTGG- 
GAAGCTAAAGCGGGAGGATGGCTTGAACCTGGGAGGCGGAGGTTGCAGTGAGC- 

36 CGAGGTCAAGCCACTGCACTCCAGCCTGAGTGATGGAGCAAGACTCTGCCTG- 

GAAAAAAAAAAAAAATAGAGGAGAGAGCAGAGCAGACACAAGAGACACAGAGACAGAr 
GAGGGAGAGAAGAGAGGGTGACTGCTTTGATTCAGGCAAGACTTCTCAGTCCCAG 
ATGAACCCACTGTTGTGCCAAGACTCAGTCATGTCCAGGTGTATGACTCGAGATTGCT- 
GAAGGAATGCCCGGGGCAGGGCACAGGCACAGGTTATTGGAGAGAAGGAGCAGA- 
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OAACATCTCTATGTGGCCAAOACTCCCAGATGGCCCTCCATATAGTCACACACAGC- 
TATCCTAAAGACTACATTTCCCAGCATCCCATTGCAATGAGGCTCCTGGCCAGTGG« 
GAGCAGGCAGAGTGATGTATGGAACTCCCAGGTTCTC^ 

TTTCTCTTCTTCTTTCTCTCTTCCTGGCTGGAGGGCAGACTTGGTGACAGCCATCTAG- 
5 GACCATGAAGGCAGGCTTACTCCCCGATGGATGGCAGAGCCCCAGGTAGATAGAGCC- 
TGGGTCCTGACTCCAGTGAGGTGCCTACAGTCCrrGGGCTGCAAACTCTTGGACTTC- 
TACTCAAAAGAGGAGAAAACTTCGATCTCATCTAAGCCACTATATTTGGGGGGCTCT- 
TTGCTACAGCTCCTGGATTCATGTAGCAAACATACCCCGGTTTCCTCCTGTATTACT* 
TACCATGCTCTGCGGCTGCTCTGGTGGGCTGCTCTGGGACGGGGCCGGGGGTGGA- 

1 0 ATGGGAGCTGGTGGGGCAGGAGCAGGGGGCCCTGCCCTGGCCTGAGATCCCTCAGT- 
GATGGGGGACAGCfCTGGCTCCGGCCCCCCGGGCCCTGGCCCCCCATGACGATG- 
GAAGAGGCGGCTGATGATCTGCTGGTACTGTTTCTTGTGGGTAGGGGGCAGGGCCA- 
CAGCAGGGGCCTGCTCCATGGAGCCCCTGCGTITGAGGGGCCGGGGAATTTCCGC- 
CAACACCCGTGCCACCTCCTCCAGCTCGGGCACCGACTGTGCCTCCGGTGGCAGTG- 

1 5 CTGGCTGCAGCCTCGTGGGGCTGAGAGGCC7TGCTACAGGGCCTTCATCCACATCG- 
CCAGCCTCCAGCACTGGTGTCAGCAGCCCCTCTATCTCCGGCTCAGGCTCCAGCTCG- 
GTGGGGGGTTTGGGGGGTCCTAGCCGGAACAAGAGCCCATCAGAGGACAGGTCCC- 
CAGGAGACACCCAACACTCCCTCTCCACAACTTCCAGGGCATACAACCAGCACATGAT- 
T7TCTGTGTGACCTCAGGGAAGTTCCTTGCCCTCTCTGGGCTACACTTTCCTTGG 

20 GTGAATAATATACAATTATGATGCCTCCCATTTATTGAGCAGTTAGTATGTGQCTGGCG- 
1 : < CTTTACATGCCTACCTTATTGTAATCTCACCACTGCTTTGTGAGGTAGATACACTGC- 

CATCTCCACATTACCGAAAGGGAATCTGGGCCTCAGAGAGGACAAGTGAGTTGCC- 
CAAAGCCATGCAGTTGGGACTTGAACTCAGTTCTGGCTGACTCTAGAATCTACTTC- 
TACCAACCGTGATAGATGTGATTTTCTGAGATCCTGAGAGTTTCCTCTCCTAACATCT- 

25 CAGGCAGAAAACTCCAGCAGGAAGTAGAATCC7GGTGTTTAATGATTTC7TCTCTGTCT- 
TACTCATTCTGACAGTAAAGCAGGTGGAAATAAAAATATGCATTATTGGCT- 
GAGTCGAGTGGCTCACACCTGTAATCCCAGAACTTTGGGAGGCCGAGGCAGGGA- 
GATCTCTTGAGATCAGGAGTTTGAGACCAGCCTGGCCAACATGGTAAAACCCTGTCTC- 
TACTAAAAATA- 

30 CAAAAAAAAAAAAAAAAAAAAAAAAATTAGCTGGGCGTGGTGGCACATGCCTGTAATC- 
CCAGCTACTCGGAAGGCTGAGGCACAGGAATCGCTTGAACCCAGGAGGCGGAGGTT- 
GCAGTGAGCCGAGATTGCACCACTGCACCACTGCACTCCAGCCTGGGCAAAAGAGT- 

gagatttcatctcaaaatatatatata^ 

tatatatagtgtatatatatttttatatagtatgcatatacatataaataatacacaca- 
35 cacacacacggctgagcatggtggctcatgcctgtaatcccagcactttgggaggct- 
gaggtgggtggatcacctgaggtcaggggttcgagaccagcctggccaacatgg- 
caaaacctcatctctactaaaaacacaaaaaatta^ 

gtaaccccagctacttgggaagctgaggtaggagaatcgcttgaacctgggaggtg- 
taggatgcagtgagctgaaacctcaccactgcattccagcctgggcaagaagagt- 
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GAAACTCCATCTTGGCTGGGCACOOTGGTTCACGCCTGTAATCCCAGCACTTTGG- 
GAGGCCGAGGTGGGCAGATCATGAGGTCAGGAGATCGAGACCATCCTGGCTAACAT- 
GATGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCTGGGGGTGGTGGTGGGCGC- 
CTGTAGTCCCAGCCACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGG- 
5 CGGAGCTTGCAGTGAGCAAGCACCACTGCACTCCAACCTGGAAGAAAGAGCGAGAC- 
TCTGTCTCAAAAAAAAAGAGTGAAACTCTGTCTCAAAAATAAATAAATAAATAAACCC- 
CAAAACACACACACATACACATTATTTCATTGAAT^ 

GATATTATTAATCTCTCTTCACAGACGGGAAACAGAGTTTCGGACAAGTAATTTATCTT- 
CAGTCACACAGCAAGTTAGCAGTGAAGAGAGACTCCAGCCCATCTGCTTAACTCACT- 

10 GATCTCACACCTCAAAATATTAATAAATTATTATAACTAATATGGTAGCTATTTAT^ 

GACTGGGTCTCACTCTGTCACCCAGGCTGGAGTGCAGTGGCGCTATCACAGCTCACT- 
GCAGCCTGGATCTCCCAGGCTTAAATGATCCTCCCACCTCAGCATCCTGAGTAGCTG- 
GGACTACAGGCGCCCACTACCATGCCCGGCAGATTTTTTGTAC^ 
TAAAGTCTATTTTAGTTTCACTATGTTGCGCAGGCTGGTCTTGAACTCCAGAGCTCAAG- 

15 CAATCCTGTCTGCATTAGCCCACCAAACTGCTAGGATTACAAGGGTGAGCCACGGTG- 
CCTGGCTAATATGGTAGCTATTGATAGCITACTATGTATCAGATCCTATTTATT^ 
TATTTTTGAGACAGAGTCTCACCCTGTCACCTGTGCTGGAGTGGAGTGGC^TGATCTT- - 
GGCTCACTGCCACCTCCGCCTCCTTGGCTCAAGCTGAGTAGCTAGGACTACAGTGGT- 
GAGCCACCATGCCCAGCTA A I I HI 1 1 1 M I 1 I I 1 111 U M I GATAGAGATGGGATTT- 

20 CATCATGTTGTCCAGGCTGGTCTTGAACTCCTGACCTCAAGTGATCTGCCCACCTCGGt 
CCTCCCAAAGTGCTGGGATTACAGGTGTGAGCMCTGCACCTGGCCCATCAGGTGCT t 
GTTTrAAAGGCTTTATATGAATrfAATAACATATGTCAATAGGA 

TTGC CI rTTTTTTTTTTTTTTTTTI IG AGGCAGAGTCTCCCCQTCACCCAGGATGGACT- 
GCAGTGGCGCAATCTCGGCTCACTGCAACCTCCACCTCCCGGGTCCAAGTGATTCTC- 

25 CTGCCTCAGCCTCCCAAGTAGCTGGGA 

ATTT1 IGT AI I I I TAGTAGAGATGGGGTTTCATATTGGCGAGGCTGGTCTCQAACTTCT- 
GACTTTGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGC- 
CACCGTGCCCGGCCCATTATTTCCCTTTTACACTCAAGAAAATTGAGGCCCAGTGAG- 
GTTAAGTGACTTGCCCAAGGTCACACAGCGTGGAACCAGGCAGTCTGGCTTCAGG- 

30 GTCCACACTTAACCTTTGAGCTATCCCTGGCTCCTA 

GGCCTAGCTCTCTGCAGGGACAGTGCTTGTAAAGAGGCATTTGGCTGTGATCTCCC- 
CACCTCCCAGGGCTGGTCTGGTCCCCCTGCCATTTGTCCTCCCTTCACCCAGTCCTC- 
TAGGGCCCTCATTGCTGACTCACCTTCGTTCACAGGG6CCATGTCTGTTGGGGATGC- 
TGGGGGGCTGGGGTAGGGGTTTGGGGTTGGGTCTGGGGCTGTGGGGGCAGCTGG- 

35 GGCTGTGGTTGTGATTGTGGCTGGGGCTGTGGTTGTGGTTGGGGCTGCAGCTTAGG- 
.CGGGGGTGOTCGGGTGAAGAGGGGGGACCCAGGGAGCATGGCGCGGCTGGCCC- 
QGTGCTCCCAGAAGGCGTTCTGCAGCTTGAAGATCATGCTGAGGGGGATGGGACGC- 
TGGCGCGGGGCCCCGCGGGGCTGGGGGCTGGAGGGGGGCATGGGGATGCGGCT- 
GACGGGCTGCCAGCTGCGAGGCAAAGTGCCCGACGGCCCCGCGGAGCCCAGCGAG- 
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CGCCGGTAGCTGCCCGCQTCTGAACGCCGGTCGCTGGCCAGAGGAGAGACCTTGTA- 
ATTGCGCGGCAGGGTGGCGCTAGTGAGGTTGTCCTGGGGAAGAGGGAAGGGAr 
GAAGGGGATCGGGTGAGAGAGGGAAGGTGGAGGGGAGGTAAAGACAAAAGACGA- 
GAAGGGAGAGGAGGTGAGGGAAGCCCTGGGAGTGAGGGAGAAGAAAGGGTGAG- 
5 GAAGGAGCAGAAACCCAGCACAGTGAAGGGAGAGCGTGGGAACGGGCGCCGAGAO 
CCAGATCGCAGCCCCGAGGGGGAGACTGGCCTTGACCCCGCTCCCCCACCCCACTC- 
CTCGACCTTCCCCAGCCTCTCCTCCCCAGGCGTCGCCTCCTCACCTTGCCGGTGCCC- 
CCCAGTCCATCCAGGCTGCTCTCCCTCCAAGGCAACAGCTGCAGGCTCGGCGAGG- 
CAGGCCTTGCGAAGACGTCCAGGCCTGCGGGGCGGGAATCATTAGGGTCTGTGGGG- 

10 CTGCCTCTCCTCCGGGTCCTCCATTCCCCGGGCCTCCACCACTCACGTTCATAGC- 

TCGCTGTCTGCGAAGGCTTCTTCTCGTACGCCACGTCCAGGTCAGACTCGTTCCAGG- 
CTTTCGGAGGCCGCCGGCGCAGCGTCAGGTCGTCTGGGGAGAAGTTTCCAGGGAG- 
GATGAGACGGGAGGGGTGGCGAGCCCCGGATCCTGCCCGCTTTGACCCCGCGAGT- 
CAAAGGCCCCGCGAGGGGCCCCTGGG7TCACCTTGCGCGCGCAGAGGCGGGGCGA- 

1 5 ATGCGCTGCCGCCGGAGCCTAGCAGGGAGCTCCCGAAGGCGGACGCTGGCG- 

CGTCGTAGGCTGTGGCAGGGGGGdGCGGTGACGGCCCACGCTCGGGGAAGAAGGC- 
CTGGGGCCCCTCCGCCAGGGGGCTGCCGCGGGGGGAGCCTGCGCGGCCCAG- 
GAAGTCGAAAGGCGTGGGGGGACCCTGCTGGCGGAGCGGGCCTGGCCCGGGCCG- 
CGGGGAGGGCGCACGGCCGAGGGAGCTGCCTGCGCCATCGAAGGCGCGGGGCCG- 

20 GGGCGAGGTCGCGCGGTCCAGGCTGCCGTAGGCGTCCGGCTGCAGGTAGAGCGGG- 
GTGCGCGGCGACGACGGCCGTCCCTTGGGGGACAGCGGGCTGTAGGGGTGTAGG- 
GTTGGGGCACTCTCTGATCGTCCGAACGGGGTGTCTGCGCCGTCGGTGGCCGCCT- 
TCCGGGGGGACCCTCGGCTGCCGAAGGGCTCAGGGATCGAGCTGGAGCTGTACCG- 
GGGCGGCTGTGGGGAGGCCAGGGCATTGAGGGATGGATCAAAGGAGACATTAGTG- 

25 GAAGGGTTGGTGTGTGGGCGGGGGTGTCAAGAGAGATCACTGGAGGTCAACCCA- 

GAGGAGGCTGACCGGCCATGGAAATTCAGGCACAGAGAGCCCAGGTGAGTAGTGGT- 
GGGGAGACAGCCCTGAATCAGCACTGTGGCTAGCCCATTACTCTATGTCACCTTTATG- 
C^ACTTAGGTAAACACCTCTTTCCTTCTGAGGGTCCCTTTAGATGTCCACTTCCACT 
GTCCCCTCTTTTCTATTTCTTT Ci I ICl I ICI I I CTCTCT CI rTCTTTTCTTTC TI l Ui I- 

30 TCCTCTCTCTCCTTCCTTCCTTTCTCTCTCTCTCCTTCCCTC 

TTGCTTGCTTTCTCTCTCTCTC 1 1 I d I IO I I I Cl 1 I CTTT C1 I \ V\ I I CM 1C TTTCTT- 
TCTTTrCTATCTCGGCTCATTGCAGCCTCAACCTCCCTGGCTTAGTGTGArC 
TTCAGCCTCCCAAGTAGCTGGGATTACAGGTATGCACCACCACACCTGGCTAACTTTT« 
GTATTTTTAGTAGAGACAGGGTTTCACCATGTTAGCCAGGCTGGTCTTAAACTCCT- 

35 GACCTCAAGtGATCCGCCTGTCTCTGAAAGTGTTG 

CGTGCCCAGCCAGATTTTrAAAAAATCATTTGTAGAGGCTGGTCTCAAACTCTTAGTCT 
CAAGGAATTCTCTCACCTCGCCTTCCAAAGTGCTGGGATTCCAGGTCTGAGCCATCG- 
CGCCTGGCCTGGTCTCCTTTTTTCAAGTTCCCTTGAAGAGCCCACAACCTGCATA^ 
TATATGGGGCAATTTTGCCTGAAATCCAGGCCTCTGGTCTGGACTGTGGCGAGAGGC- 
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tggctttggagatcaaggtgggaaccaggcttaccctagaagggggtccggcctg- 

cgggccaggaggcgcgggagagtctgaccacagcgactccagctgcttggtcagtt- 

catccaccttggccgccgcc^tgtcgagctccatctgcttgagatccatgtgtttcat- 

ggccagcgctgggaaggtgggagtggaggtaaggacctggcctcctggcaggggo 

cggcctcagcacccctcgccc^ctgccgaggtccccgcctcgccagccccgccccc- 

tactccagcttacactggaagttcatgtccagaaagtcccgcgcgctctggaatgcc- 

tcgctgtccatggtgccggccggagcgggcgcctgcatggtggggagggagggag- 

CTGGCTAAGACCCCGCCCCTCTAGACCCCGCCCTCAGGGAGTCAGACGCCGTCAG- 

GAGCGGGACAACGCCTCAACTCAGTTCCTTCCCCTGGAAGCCCTTTACCCTTTCACC- 

TCCCCAGCTGGGAAATGCCAACTCCTCCAAAGCCAAGTCCATGCGCCACGGAr 

GAAGTCC AAACCCAGTCTAAAACCTCCGGAATTCACTTTCTCTTT C \ Mill 1 CTTTTCT- 

rTTTTTTTTTTTTTTI GTGTATGTQTGTG AGACAGAGTCTCGCTCTGTCGCCCAGGCGG^ 

GAGTGCAATGACGCGATCTTGGCTCACTGCAACCTCCGCCTCCCGGGTTGAAGCAA- 

ATCTTCTGCCTAGCTGGGACTAGAAGCGCQCGCCATTATGCCCQGCTA AI ITTT G- 

TAGTTCTGGGATTACAGGAGTGAGTCTCCGCGCCCGGCCGTGTCCATCTCTTTATCT- 

CAGTCCTAAGACCTGAATCACTCCTTGAAOAATTATCTATTGATCACCTACAATGTGC- 

CGGTAAACATAGGATGGAATAACTATGAATTACTGAATGT1TACTAGGGACCAGGACG- 

CACTGTGCTAGATCCTG I HUGH I GTTTTTGAGATGGTGTCTCGCATTTTCGCC- 

CAGGCTGGAGTGCAQTGGCGCGATCT.CGGCTCACTGCAAGCTCCGCCTCCAGGGTT- 

CATGCCAGTCTCCTGTCTGAGCCTCCCGAGTAGCTG 

CATGCCTQGCTAAATTTTTGT AI I LI i A GTAGAGACGGGGTTTCACCGTGTCAGCCAG- 

GATGGTCTCGATCTCCTGACCGCGTGATCCATCTGCCTCGGCCTCCCAAAGTGCTGG- 

GATTACAGGCGTGAGCCACCGCGCCCGGCCCTTGTTTTT G I INI IAATAATAATTCT- 

GCTGTCTGCTGTGTACTAGAACCCATGCCTACTGCTTGGGGTATAATGTAGTAAATG- 

TAGTAAAAACAATATCCGCCGGGCGCGGTGGCTCACGCCTGTAATyCCAGCACTITG- 

GGAGGCCAAGGAGGGCGGATCACGAGGTCAGGAGAGCGAGACCATCCTGGCTAA- 

CATGGTGAAACCCCGTCTCTACT AAAAATACCAAAAATTAGCCAGGCGTGGTGATG- 

GACGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAACGGCGTGAACCCG- 

GGAGGTGGAGCTTGAACTGAGCGGAGATCGCGCCACTGCACTCCAGCCTGGGCGA- 

CAGTGCGAGACTCCGTCTTAAAACAAACAAATAAATAAATATGTTTAAAACAACAACAA- 

CAATAACGAGCCAGGCGCGGTGGTTCACTCCTGTAACCCGAGCACTTTGGGAGGC- 

CGAGGTGGATGGATCGCTTGAAGCCAGGAGACCAGCCTGGCCAATATGGTGAAACC- 

GCGTCTCTACAAAAAAATACAAAAGTTAGCTGGGCATGGTGGCATGTGCCTGTAATCC- 

CAGCTACTCAGGAGGCTGAGGCACAAGGCTCACTTGAACCTGGGAGGCACAGGTTG- 

CAGTGAGCATAGATTGTGTCACTGCACTGCAGCTTGGGTGACAGAGCGAGGCTCTAT- 

TTAAAAAAAAAAAAATTAATTGAGGGGCCACTCCCTTCTAGAGTGGTGAGAAATC 

CGTGqACCGAAAGCTTCATTTGATGGTCAAAACCACCCTAGCAGGCAAGAAAGCAT- 

GGCTCAGAAACATATGTTCAAGGTCACCCTGCAAGAAGTCGGTAGTAATCGGTTTCA- 

CACCCGCATCTAACTTATTCTGGGTCATCTCTACCAGA7TAGAGGGGTCCTAGAGG- 



75 



25/02 2003 15:40 FA2 33320^J^ HO I BERG A/S - @076 

P687DK0Z 

73 

GAAGCGACTGCTCAGCTTCCTTTCCCTAGGGTCCCCATTCAOTGGAGGTCTGGCTCT- 
CACTGACCCATTGTTAGCAAGAGGAACAGGQAGGTGGCCAGGGGTGGAGGGGCAGC- 
TGTGGTCACTGGCCCAGTGGGAGGGAGCTAGGCCACTAGGAACCGGTCAGGCCAG- 
CACCATCCCTATCCCCATGCTAGCCACCACACCCACCAGCTCTGCCACCTCCCTGCT- 
5 GCATCGACCACTTAGCTCTGGCAGTATAGGCAGCAGGGCAGGCTGGGGCATGCTGA. 
TACCCGGCTCTGTCTGGGAAGTCGAAGGAACAGAACCTGTTCAGGCTGGCGGCTCAT' 
TTGGATGAACAGGGAGTGTGTGACG1TGGGCGTTGAGTCCTCTCCACTCCCTGGGCC- 
TCAGTCTCCCCAACATCAAAGAAGAAGGCAAATCACCI 1 1 1 1 1 I M 1 1 1 1 1 1 G AGATAG- 
. GGTCTCGCTCTGTAACCCAGGCTAGAATTGTGACTCACTACAGCCTCTTGACCTCG- 

10 CAGCTCAAGTGGTCCTCCCACCTCAGCCTCCTGAGTAGCTGAGACTATAGGTATAGCC- 
TCGCACCACCAC ACCCAGCTA A i H I 1 1 1 1 I I I M M 1 1 U i M M M I N I ) M 1 GA- 
GACGGAGTCTTGCTCTGTCGCCCAGGCTGGAGTTCAGTGGCGGGATCTCGGCTCAC- 
TQCAAGCTCCGCCTCCCGGGTTCACGCCATTCTCCCGCCTCAGCCTCCCAAGTAGCT- 
GGGACTACAGGCGCCCGCCACTACGCCCGGCrrAATTTTTGTATTTTAGTAGAGACQG- 

15 GGTT7CACCATTTTAGCCGGGATGGTCTCGATCT 

TCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCACCCAG^ 
CTAATTTTTTAAAAACATTTTC^ 

GTCAGGAGCTCGAGACCATCCTGGCTAACACAGGTGAAACCCTGTCTCTAC3TAAAAA- 
ATACAAAAAAATTAGCTGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCG& 

20 GAGGCTGAGGCAGGAGAATGGTGTGAACCAGGGAGGCGGAGCTTTCAGTGAGCCGA- 
GATCGCGCCACTGCACTCCAGCCTCGGAGACAGAGCGAGACTCCGTCC- . 
CAAAAAAAAAAAAAAAAAAAATTTGTAGAGACAGATCAAGTCTCACTTTGTTGCTCAGG- 
CTGGTTnTGAACTCCTGGGCTCAAGCAATCCTCCCGCCTCAGCCTCCCAAAGTGCT- 
GAGATTACAGGCATGAGCCACCACACCTGGCCAAATCAGCTATTCTGAAAGGCCCCTT- 

25 TAATCTCTATGAGCCCCAGACT7TCAAACTGTAAGGACCTTAGGACTGTAACTAAAGT- 
TCTACAGAGCCTAAACCCCTCAGCTAAAGAGCCTATTGTTGGAAAGTTCTGAGTCCAA- 
GATTCTATCTTTGGAACATTCTAGAATTCTCCAATTTGTCTAACCCAGAATTCTGAGTCT- 
TTCTGTACCACATTCTACCTAACCCAGGGTTGCACTGCTCTGGAAGTCTAGATGGATG- 
GTATAGTGCAGCTGGTAAAAGCATGAGTAAGAAGTCAGACTTCAAAAATTCAAATCT- 

30 GAGGGCCGGGCATGGTAGCTTCTGCCTGTAATCCTTGCACTTTGGGAGGCCGAGGG- 
GGGAGGATCACTTGAGGCCAGGAGTTCAAGACCAACATGGCCAACACAATGAGACCC- 
CATTTCTTAAAAAAAATTAAAATAAAATCATCAAATCTGGCAGCACCACCGTCCAACCC- 
TGACCACAGTACCTCAGTCTCGTAATCCGTAAAATGGGGATGAAAGTTCACCTCATAG- 
GACTACTGTAAGAATCCACCTGGTCAGAAGGTGCAGGAAGAATTCAGAGCTCTGAGA- 

35 ATTGAGGCGTCAGGAAGAAGAGAGTACAGGAATAAAAACTCGGGCATTTAGAATTTCA- 
GAGATACAC/VHACAATACTTTGTTAACTGTTAAAATAGATAAATGAGCAAGTCT^ 
CAGCCCTAATGCCAGCTGTAAGTGACTC I mini IC TTTTGGTAQAGATTTAGTCTC- 
TCTCGCGCCTGTGGTTAGGCTGGTCTCGAACTCCTAGCCTCATGGGATCCTCCCCGG- 
CTCGATCTCCCAAAGTATTGGGATTACAGGCGTGAGCACGGCGCCATGATCCCCAA- 
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ATTTCCAAGATTCTCAGATTCCATACTGACATTCTCTGGCTCTCAGGAAATGCCAACC^ 
TGGGTGTGGGGCTGTCGCGGGGACAGGCGGTGGGGACGTCGGAGCCACCAGGGGG- 
CGGTCACGCCCGGACCCCCGCCAGGAGGGCGGACTGCGCCTGAGCTCAGGCCCGG- 
GGAATGCGCAGCGGGCCCGGGCAGGTGCTGTACATCCCGGGGCAAGGGAGCTGGG- 
5 CCGGGCGGGGTACAAGGGCGGGGCGCGGGGGTGGCGCGGGCCGTGTGTCTGTTCC- 
CAGGCCTCTGCCCCTGACCTCTGCCTCCGAGTCCTCTCCCATGTGCTCCCCTCTAGC- 
TCTAGCTCCGAGCTCTCCCGCGGGCTCTGGGCCAGCCGCAGGTACTCTCCCCTGGG- 
CTCCTCTCTCCGCTCCACCCCT 

CAGGTTCTTTAGGGCTAAGGATCCTGTGGACTTCCTGGAGGAGTCATCnTCAGTA^ 

1 0 GAACCGGGTCAGAGAGCCAGACTGAGCTGGGAACACCCAGGCTGGACTCCTACAGD- 
CCTGTCGGGTCAGACTGAATCTGGAGAGGCTCCACTGTC 
TCCTTTGTGGACGTCTATGGAATGGGCTAGGGCCTTTCnTGCTCTMGCCT 
GGCTTGTTATTTAGCTTCTCTGTGCCTGTTTCCTCATGTGGACCATGGGAAG 
ATACCTTCGCCTCAAAGGGGTATGAGGATTGAGTGACATAATTTATAAGCCGTGATTA- 

1 5 GAACAATGCAGTGCGCGAAATAAAGTTCACACATACAGGATTCATAATTACCAGAT. 

GTCCTTGGCTGTTCATTATAATAACACAGGGTCTGGCAACAGAGTGAGGGGTCCAGAC- . 
TCAATGTAAI I 1 1 I 1 1 1 1 CCCCTAAAAGGGCCCTTTCAACTCTTTCTGAGATCATACAAG-- 
CCCTGAGTTTTGACACCCAGGGTCTCAACTTCCTGAGCCCTTGCXTCTCAGAGTCC- 
TAAATTTCCCCTGTACATTCCTGAGTCTGGCCAGTGATCACCCTCAGTCACTTAGG- 

20 ' GACGGGAGGGCTGGGAGAGCCCTGGAAGATTCCAGACAGAAGCTGGCAAAAGCC- . 

CAGGGTGTGGGCAATATCCACTCTCCAGCCTCCGTTTCTCCACTCGTAATGAG- . 
*' GAGTCCTTCCCTGGGGTCAGCVVAACCTTATTCAAAGGGAGACCT^ 

GATTCCTCTAGACAATGTCAGCTTTCCTACCTACCTACCTACCAGCTCTGAGCTTGG- 
TACACCCAGAGCCCTGTTTTGGCAACCAOGGTTATT AI 1 1 TT AATTTC ATTTC AG QT- 

25 TATCATCAAATGCCCTTCAAGCCCAGACATTGGGAAAGACTCCTCTCTCATCAGATGC- 
TCGCCTCCCCCATTCTGTTTTTAATCCCCCTTCrTTAGGACGCATGGGGGTTGAGA- 
GAACGGGGAGATAGACAGAGGGAGGTGCCTGGTCCTGCCCTCCCCCCGCCTCAAG- 
GACAGACAGACACCTCCAGAATTAGCCTCTGTCCCTCCTTATCTCCCACAATACCC- 
CAGGTCAGACAGATGGGCGTGGAGGTGACATTTCTCACCTCAGGGTCAGGGCAAG- 

30 GAGCCCTGAGGCAGAAGGTTAGTCaGaAAATCTGGCGGGGGCGGATGGAATCC- 

CGTCCCCCAGAGAGCTGCAGAAGAAGGAGGAGGCAGAATCCTGACCCTACAAACTC- 
TACTGCCTGTGTGAGCTCCMGCCTC^^ 

ATGCCCGGCTATGCAMCCTCCCAGMTCCAATAGCCGCTTTCCGGAATTCTGCCCT- 
GGGTTCTAGAACTACCTCTGCAAACCCAGCTGTTTCCCACCCCATAAGGCAATAGGG- 
35 GAGCCCACCTCCGCCAGGGGGTGCCCTAGGGCGGATGTCCCTTCTCTGGTTAG'G- 
CAGGTCTGACGCCCAGGTTAATGACATGTTGGGTTCGCTCAGCGGCACAGAGGAG- 
GTTGGAGATPTGCCTCGGTGTTTTCTCTCCTACCCC6CCCCCATCCCCGAGC- 
CGAAAAGTCGGGGGAGAGCCGGGACACAGCCTCCGGAGGGACCCCGGGTACCT- 
GTCCTGCTCCACTTCAGGAACCCAGGCTCCACTATCCCTGCCCCACCCTTAATTCTGO 
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TCAGAGACCTAGAAGATCGGTCGA6ACAGCAGCTTGAGQCTG6CAGGGTGGTCACC- 
fcATTCCACCTTGAGCCCCACCAGTCTGAGCCTCTCATTTCTGACCAAGACTCGGGGAT- 
TCGAACCCCTATACTACCCAAAGACTCGGCTTCCTAGAGCCCCCCAGTTCGAGGGAC- 
TCAGGAATTCCAGCTCCAACGTCTCCCCGGGATGAAGGGGTAGAATCCCTCCATTC- 
5 CAAGAATTCAGGCATCCGAACCCGCTTTCCTTCCCTCCAGTAAAACAGGCAACGGAGT- 
TTCCTTCTAAGGATCCAGGTGTCGGCGCGCCCCAAATTCCGCCCTGGGACCTGG- 
CGTCCGAGTCCCGTCCCAATCCTCCCAGGGACGCGGGTGTTGGG Cn 1 1 ICAGGGCC- 
TCTGGTCCCCA6GAGGGTGAAACTCACGGATCCGGGCAGATCCTGGCACCTGGGGG- 
CTTCCTCCAGCTCGGGCTCCGGCTTGGGGAGCGGAGAACGGGGCGGGGCAGGAGC- 

1 0 TG6GAACAGGTTAGACGACGTGACTTGGGCTGGAGGGAGGCGGGTCCCGGTGGG- 

GAGGGGGAGCCAAGGTCGCCTCGAGCACOTTGGGACTTGTAGTCCCGGAGGGACAG- 
GACGTAGCCCAAGACGATCCCATTTGGATTCACCCAGAGTCCATTTCACAGACAG- 
GAAGGGCGAGGCCCAGAAGCCGAGAGCGACCAGGCCAGGGAGATACAGAAGAGG- 
CGAGACGCCTGCCTCGCTGTGGCTGGAGACTGACTCCTGAGCCCTTGCCCCACCCCT- 

1 5 TCAGGCGCACTATCCCCTTTCCTGATCAGTATCCCCCAGGGTCTCTGAGCCCGAATC- 
TCCCCGTCGATAAAAAGCGCGGGTTGGATCTTCAAAGGATGTCCCAGCAAGAGTT- 
CAAAATCTTAGTTTGGACTACAACCCCCAGCAGCCTCCGCGACCGCCCTCGGGCGAC- 
TCTTTGCCTCGGGTCCTGTGGGAATTGTAGTCCTGGAGCCCGCAGGGCTGCACCC- . 

CGGTGTCTCTCTCGCCCACGCGAAGGAAACCGTCTGGAGATCCTGGATAGGGGAAA- 
20 CATTTCCCCTTCCCCTTGACrc^ 

gaaggggtgccccaattctggagtaggatcctaaatcttggcagagggggcgg- 
: gaagtggcgctgacacactggccaggaatgcagtcgggtcaccctgtctagccac- 
cgtctcgcggctccaaccgccgcccaacgcggggcggccccagtgggaagg- 
gaagtgggtgcgtcccccaaatcrrgtgtccaggtgccgctgtttacacgctccctgg- 
25 ggcagggaggagtcgccga7caggtcccttcgtgaaagtcatcgaggt7tcccacg- 
catgagactaaacccccgagggcatctacaagtcccatttgatccacaaacgctacac* 
cgtgcccagcaccactccacgcgtgtggggctcctgggtccgaggctccgcccr 

TCGAGAACCACAAGCTCCTCCCCCTATGTTTCCCGCTCCCCCGGAGTCCAGAAGCC^- 
CGCCCCTGGCTGGAACTTCACGCCCTCCGGACGGATTGCCCCTATTTCTCCATTTTCC- 

30 CGCTTCTCCCAGTCAAGTTCTGAACTTGTGAGGCATCTGGGCCTCCCCAGAAGACATT- 
TAACACAGAAAGCACAGCCCTACTAACTAGTATTCTTACCTGTCTCTTCAAGAATTTCA- 
GACCAATCGACCGTCCTGTCTCTTTAAGGCTTAGGAAGAGCAGTGTGGCTGCCCCTT- 
TAAGGAGGCGTTGCAACAAACCATATTGGACAGACGATGGGGGCGACCCATCGG- 
GACCCGACGGGCCTCTGACTCCAGCAATACAGCGAATCAGCGGCTTTCGGGAATA- 

35 CATTTTTCGGAAAAAGACtTCTTCCTCQ G 1 1 1 T CTGCTCTGCACACGTTGAAATTTTCO 
CCAGTTTTTCCTGCAGATCGGGAGTCGAGCAATGCCTACCCCCGCGCTCCCGCAC- 
CAGTTGGGCGCTCCCGGATGATGCCCTACCCCTTTGGATCCACGTGGTCTGCAACCT- 
GGTGCGAGCAGCCCGGGCTACAGGGTTGCCTGAGGTGTGGGTCCCAGGATGGAG- 
GAGCCCCAGGCCGGCGGTGAGGGTGCGGGTTGACGGGGTGCGGAGGGTGCGTTG- 
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GTGGAAGGAGAAAGGGGCGTCCGAGAGGGTTCGGGCGGAAAAGGAGGCGTACGTG- 

CAAGCAGGACTTGCGAAGAGCGTGCATTCCCAGTGGGCGAACGGGAATTCGAACG- 

GAGAGAGGGTTATCTTGTGGGGGGCTACCCGTGGAGAGCAAGGCGCCCCCAGGG- 

GTTGGATCGGTGAAATTGAOGTCGCCCCTGGGGAACAGGTGGGCAGAAAGGA. 

GAAACCAGGTTGAGGGGACTGGAGTGCTCACGAGGTTAAGACCAATGGACCGA. 

TAGGCGCGCCCTGCAAGATTGGACCGGCAAGGAGGTGTCAGTCGACCCCATTTCCCC- 

TTCTGCTGCAGATGCTGCTCGGTTCTCTTGTCCCCCCAACTTTACCGCGAAGCCCC' 

CAGCCTCAGAGTCCCCTCGTTTCTCCTTGGAGGCGCTGACGGGTCCAGATACGGAGC- 

TGTGGCTTATTCAGGCCCCTGCAGACTTTGCCCCAGAATGGTGAGTGGTCTTGTT- 

GACGGAAAAGAGGGTCCCGGTCCAGACCCCAAGAGCGGGTTCTTGAATTTGTCACAG- 

GAAAGAATTAGAGGTGAGTCACAGAGCACAGTGAAAGAAACAAGTTTATTGGAAAC- 

TACTCCTTTACAGAGTAGAGTGTCCTCAGAAAGCAGGGGGAGAAACCCACAGCCCT- 

TTGTTAGTATTTCTACTTATAAGAAACTATAAGGAACTATAGTTAAACTTGGAGTGTG' 

CAGATAAGCTCACTAAAGGTAGGGGCTATTGGTGTTATCCACGACCATTAATCCTG- 

CAACCTAAGCTTGCTCATTTATGTTATATTTAAGTAATGGGGGCTGCATTCTTAG 

CATTTGGACATTCTGCAGGCTTGGTGGAACATGTTCTGTATGGCCATAAATATTCTGTA- 

ATTATAATTGGTGGTCAGCCTGGGATG^ 

GTAAGTGCCTAGCTACTCACTTTAAGATGGAGTCACTCTAGTC 

CAGAGGCCAGCCAGGCGCAGTGGCTGGTGCCTGTAATCCCATCCTTTGGGAGGC- ; 
• CGAGGCGAGCAGATCACTTGAGGTCAGGAGTTCAAGACCAGCCTGGCCAACATAGT- : 
GAAATTGTCTCTACTAAAAATACAAAAATTGGCTGGGCGTGGTGGCAGGTGCCTGTA- 
ATCCCAGCTACTTGAGAGGCTGAGGCAGGAGAATCGCTTGAACCCAGGAGGTGGA- - 
CATTGCAGTGAGCCGAGATCATGCCACTGCACTCCAGCCTAGGCAACAGAGCAAGAC- 
TCTCTCAAAAAAAAACAAAAAAAAAATCAAAAAACCTTCCCTCTCCTGTTCCACTTAAG- 
CCTCTGCCCTCCCTGTTTCTCTCTGTAGCTTCAATGGGCGGCATGTGGGTCTGTCTGG- 
CTCCCAGATCGTCAAGGGCAAATTGGCAGGCAAGCGGCACCGCTATCGAGTCCTCAG- 
CAGCTGTCCCCAAGCTGGAGAAGCGACCCTGCTGGCCCCCTCAACGGAGGCAGGAG- 
GTGGACTCACCTGTGCCTCAGCCCCCCAGGGCACCCTAAGGATCCTTGAGGGTCCC- 
CAGCAATCCCTGTCAGGGAGCCCTCTGCAGCCCATCCCAGCAAGTCCCCCACCACA^ 
GATCCCTCCTGGCCTGAGGCCTCGGTTCTGTGCCTTTGGGGGCAACCCACCAGTCA- 
CAG GGCCTAG GTCAGCCTTGG C CCCCAAC CTGCTCACCTCAGGG AAGAAGAAAAAG- 
GAGATGGAGGTGACAGAGGCCCCAGTCACTCAGGAGGCAGTGAATGGGCACGGGGC- 
CCTGGAGGTGGACATGGCTTTGGGGTCGCCAGAAATGGATGTGCGGAAGAAGAA- 
GAAGAAAAAAAATCAGCAGCTGAAAGAACCAGAGGCAGCAGGGCCTGTGGGGACA- 
GAGCCCACAGTGGAGACACTGGAGCCTCTGGGAGTGCTGTTCCCGTCCACCACCAA- * 

gaagaggaagaagcccaaagggaaagaaaccttcgagccagaagacaagacagt- 
gaagcaggaacagattaacactgagcctctagaagacacagtcctgtccccgac- 
caaaaagagaaagaggcaaaaggggacggaagggatggagccagaggagggggt- 
gacagttgagtctcagccaca6gtgaaggtggagccactggaggaagccatccctct- 
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GCCCCCTACGAAGAAGAGGAAAAAAGAAAAGGGACAGATGGCAATGATGGAGCCAG- 
GGACGGAGGCGATGGAGCCAGTGGAGCCGGAGATGAAGCCTCTGGAGTCCCCAGG- 
GGGGACCATGGCGCCTCAACAGCCAGAAGGAGCGAAGCCTCAGGCCCAGGCAGCTC- 
TGGCAGCTCCGAAAAAGAAGACGAAGAAAGAAAAACAGCAAGATGCCACAGTGGAGC- 
5 CAGAGACAGAGGTGGTGGGGCCTGAGCTGCCGGATGACCTTGAGCCTCAGGCAGO 
TCCCACATCCACCAAGAAGAAGAAGAAGAAGAAAGAGAGAGGTCACACAGTGACT- 
GaGCCAATTCAGCCACTAGAGCCTGAACTGCCAGGGGAGGGACAGCCTGAAGCCAG- 

ggcaactccgggatccaccaagaagaggaagaagcagagtcaggaaagccggatgc- 
cagagacagtgccccaagaggagatgccagggccgccactgaattcagagtctggg- 
1 o gaggaggctcccacaggccgggacaagaagcggaagcagcagcagcagcagcct- 
gtgtagtctgcccccgggaaactgaggaactaaagaaagctgaaggtgcccacctg- 
ggccaccagaaggtgacacccccagaatccctccccagagactgcaccagcgcagcc 



15 

Example 7 

The cases and controls in .example 6 had been individually matched with respect to 
20 age, menopausal status and hormone treatment Therefore, it was possible to make 
a paired analysis. This generally reduces the possibility of bias and confounding, but 
often produces less significant results. When the "high-risk" group was analysed, i.e. 
RAM** ASE-1e3 GG ERCC1 GG , versus ail other genotypes, we found a rate ratio 
(RR) = 1.64, Confidence Interval (CI) ~1.i7-2.29, and with a ievei of significance p 
25 - 0.004. Thus, the "high-risk" genotype was clearly overrepresented among the 
breast cancers. 

Example 8 

30 In the data of example 7, the "high-risk" group was further analysed, Le. RAM'* 
ASE-le3 GG ERCC1 GG versus all other genotypes, among those pairs that were less 
than 55 years of age. This Increased the difference dramatically, indicating that the 
high-risk genotype predisposes to early breast cancer (rate ratio (RR) = 9.5, Confi- 
dence Interval (CI) = 2.21-40.79, and with a level of significance (p) = 0.003). In 

35 older age brackets, the RR was still above 1 , but not significantly so. Thus, the com- 
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binatlon of the three SNPs allows for the definition of a high-risk group for early 
breast cancer. 

Example 9 

. S 

Blood samples were collected from a large number of Danish citizens end frozen 
(Example 6). The persons were also interviewed about a number of Issues Including 
smoking habits. After a number of years those persons, who got lung cancer in the 
Intervening period, were Identified, as well as a set of matched controls. DNAs were 

10 purified from the blood samples and a number of polymorphisms, namely XPDelO, 
XPDe23, RAIil, ASE1e1 and ERCC1e4, in and around the region were typed. The 
three latter polymorphisms were combined into a "high-risk* group that was homo- 
zygous for the high-risk alleles of all three polymorphisms: RAM** ASE1e1 OQ 
ERCC1e4 GQ . All other genotypes at the three loci were combined into a low-risk 

15 group (Example 6). XPDelO, and XPDe23 were not combined with other markers. 
The results are shown in Table 11. It is dear that the "high-risk* genotype Is associ- 
ated with lung cancer In the youngest age group. XPDe23 shows signs of being as- 
sociated at all age groups, while XPDelO did not appear to relate to the disease; 
•" Therefore we recalculated the results for the youngest age group without XPDel 0. 

20 Table 12 shows the results. Calculated this way both polymorphisms related to the 
risk of lung cancer. 
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Table 11. The risk of lung cancer in three different age groups in association with 
the high-risk genotype, XPDe10, and XPDe23, mutually adjusted for each other and 
for the duration of smoking. 

High-risk genotype ™ 



Age at diagno- 


High-risk 


Rate Ratio 


Confidence 


P-value 


sis 


genotype 


(RR) 


Interval (CI) 




50-55 


No 


1 








Yes 


4.43 


(1.45-13.56) 


0.009 


56-60 


No 


1 








Yes 


0.73 


(0.30-1.83) 


0.51 


61-70 


No 


1 








Yes 


0.93 




0.62 


XPDe10 


Age at diagno- 


Genotype 


Rate Ratio 


Confidence 


P-value (trend) 


sis 




(RR) 


Interval (CI) 




50-55 


GG - 


1 




0.99 






2.78 


. (0.57-13.7) 






AA* 


1.2 


(0.14-10.4) 




56-60 


GG 


1 




0.17 




AG 


0.46 


(0:18-1.20) 






AA 


0.41 


(0.09-1.93) 




61 -70 


GG 


1 




0.40 




AG 


0.91 


(0.46-1.80) 






AA 


0.64 


(0.25-1.64) 




XP0623 


Age at diagno- 
sis 


Genotype 


Rate Ratio 
(RR) 


Confidence 
Interval (CI) 


P-value (trend) 


50-55 


AA 


1 




0.25 




AC 


1.69 


(0.34-8.41) 






CC 


3.62 


(0.39 - 33.6) 




56-60 


AA 


1 




0.11 




AC 


1.90 


(0.73 - 4.92) 






CC 


3.40 


(0.71 - 16.3) 
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61-70 AA 1 0.08 

AC 1.86 (0.95 - 3.63) 

CC 2.23 (0.79-6.31) 

Table 1 2. Risk of lung cancer among those 50 - 55 years in association with the 
high-risk genotype and XPDe23, mutually adjusted for each other and for the dura- 
tion of smoking. 



Polymorphism 


Rate Ratio (RR) 




P-value 


High-risk group 1 


No 


1 






Yes 


4.27 


(1.42-12.89) 


0.01 


XPDe23 


AA 


1 




0.01 2 


AC 


3.20 


(1.13-9.02) 


1 


CC 

\ BAK4AA ACC4a<I 


5.02 


(1.32-19.1) 





2 Trend test 
Example 10 



In some of the samples of example 6 we typed a 4 bp deletion (dbSNP#3916791) 
located in the common portion of the sequences Sir £2 and S3 contiguous with 
sequence SEQ ID NO:1 . SpeclficaUy, the polymorphism is contained In the se- 
quence GCGCCTGCCAAGATrG7CTGAGTATTQATCGAACCC, where the bases 

15 represented with boldface, Italicised letters are present in some hCiman chromosome 
19 but not all. The deletion was typed by (1) Performing a PCR on the persons DNA 
with the primers 5'^-FAM-TGAGACGAGGTGGAGG-3' and 
. 5'«CAATCAAAAAGAAAACATGG-3\ The fluorosceln-containing (3-FAM) primer 
was obtained from TIB-MOLBIOL (Berlin, Germany), while the other primer was 

20 obtained .from DNA-Techno!ogy (Aarhus, Denmark). The reaction mix contained 

0.84 U Taq polymerase (Roche), 1.7 nmole of each dNTP. 5 pmole of each primer. 
1X PCR buffer (Roche), 1 M betain and approximately 20 ng DNA In a total volume 
of 9 ul. We used a temperature program containing 4 min denaturation at 94 C. fol- 
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lowed by 30 cycles of 96 C for 1 mln, 55C for 30 sec. and 72 C for 45 sec; (2) We 
then mixed a sample containing 1 ul PGR product, 0.5 ul GeneScan-500 ROX size 
marker (Applied Biosystems) and 19 uf formamide; and (3) loaded the sample onto 
a single lane of Sequagel-6 matrix on a model 3100 Genetic Analyzer (ABI Prism, 

5 Applied Biosystems) using fluorescence detection. The persons who were homozy- 
gote for the complete fragment gave a length of 167 bp relative to the size markers, 
the persons who were homozygote for the 4 bp deletion gave a length of 163 bp. 
and the heterozygotes showed both lengths in roughly equimolar amounts. Because 
we repeatedly have observed that the underlying risk-genotype seems recessive 

1 0 (Examples 2, 6, 7, 8). we pooled the homozygous low risk genotypes (1 63/1 63) and 
the heterogotes (163/167). 

Table 13 shows the observed genotype frequencies among the cases and controls, 
the Odds Ratios for the genotypes, the confidence Intervals, and the p-values for the 
1 5 Odds Ratios. Clearly, homozygosity for the 1 67 bp fragment was associated with 
increased risk of breast cancer. 



Table 13. Risk of breast cancer in association with genotypes of the 4bp deletion In 
20 SI- 



Genotype 


Number of 
cases 


Number of 
controls 


Odds Ratio 
(OR) 


Confidence P-value 
Interval (CI) 


163/163 + 


92 


129 


1 




163/167 










167/167 


60 


44 


1.91 


(1.19-3.07) 0.007 



Example 11 



25 The blood samples described in Example 9 were analysed for the 4 bp deletion de- 
scribed in Example 10, and the results were, combined with previous results for the 
polymorphism XPDe23. As a preliminary investigation showed the effects of the . 
genotypes to be largely additive, we grouped the persons according to the number 
of "risk" alleles they were carrying, using the XPDeZS** 4bp 165U1fl3 as the lowest risk, 

30 and thus placing those persons in group 0, and furthermore using them as reference 



» 
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for the calculation of the Odds Ratios. Table 14 shows the number of cases and 
controls in the different groups, the Odds Ratios for the different groups, the confi- 
dence intervals for the Odds Ratios and the p-values for the Odds ratios (calculated 
by the two-sided Fisher's exact test). Clearly, the risk of lung cancer increased dra- 
6 matically with the number of risk-alleles. 

Table 14. Risk of lung cancer according to the number of "risk'-alleles In the poly- 
morphisms 4bp and XPDe23. 



10 



20 



Number of 


Number of 


Number of 


Odds Ratio 


Confidence 


P-value 


"risk'-alleles 


cases 


controls 


(OR) 


Interval (a) 




0* 


3 


12 


1 






1 2 


57 


73 


3.12 


(0.84-11.6) 


0.10 


2 3 


123 


129 


3.81 


(1.05-13.8) 


0.034 


3* 


49 


35 


5.6 


(1.47-21.3) 


0.01 


4 s 


4 


1 


16 


(1.27 - 200) 


0.03 



1 XPDe23 AA 4bp 1$3/163 

2 XPDe23 AC 4bp 1B3,183 f and XPDe23 AA 4bp 163/157 

3 XPDe23 0c 4bp 1M/1 w , XPDe23 AC 4bp 163 " 67 , and XPDe23 AA 4bp 187/t67 

4 XPDe23 cc 4bp 1KUle7 ' and XPDe23 AC 4bp 167 ' 1 * 7 
15 5 XPDe23 CG 4bp 107/, ° 7 



Example 12 



The data of examples 9 and 1 1 were combined and relative risks for lung cancer for 
the high-risk haplotype, the 4 bp deletion, and XPDe23 mutual adjusted for each 
other were calculated In 3 age-groups. The use of adjusted relative risks ensure that 
the effect of each marker is peculiar to it, and cannot be attributed any of the other 
25 markers in question. Table 16. 16, and 17 shows the result After the adjustment it Is 
apparent that all three markers have an effect independent of the others. Moreover, 
the adjusted effect of the high-risk haplotype is strongest among the youngest per- 
sons, while the adjusted effect of the 4 bp deletion Is strongest In the oldest age 
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group. XPDe23 exerts its adjusted effect at all ages, but possibly strongest in the 
youngest age group. 



Table 15 Relative risks and 95 percent conficence intervals for lung cancer in 
different age groups as a reflection of presence or absence of the high-risk 
haplotype in homozygous form, adjusted for the 4bp deletion and XPDe23. 



Age at diagnosis 
<YR) 


Homozygous 8 RR 95 % Ci 


50-55 
56-60 
61-70 


No 1.00 

Yes 4.26 1.38-13.17 
No 1.00 

Yes 1.07 0.36-2.98 
No 1.00 




Yes 0.82 0.44-1.53 


a) Homozygous earn 


ers of high-risk haplotype we detuned as ERCC1 e3con4 ou , ASE- 



yexonl^JWJintron 
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Table 16. Relative risks and 95 percent confidence intervals and p-values for 
trend for lung cancer in different age groups as a reflection of alleles at the 4 
bp deletion site, adjusted for XPDe23 and the high-risk haplotype. 



Age at diagnosis (Yr) 


Allele 


RR 


95% CI 


Pflrend) 


50-55 


163/163 


1.00 




0.31 




163/167 


1.35 


0.36-5.02 






167/167 


0.35 


0.11-2.87 




56-60 


163/163 


1.00 








163/167 


1.76 


0.58-6.38 


0.75 




167/167 


1.04 


0.26-4.14 




61-70 


163/163 


1.00 




0.02 




163/167 


0.67 


0.36-1.22 






167/167 


0.36 


0,16-0.62 
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Table 17- Relative risks and 95 percent confidence intervals for lung cancer In 
Different age groups as a reflection af alleles at the XPDe23 site, adjusted for 
the high-risk hapJotype and the 4 bp deletion. 



Age at diagnosis (Yr) 


Allele 


RR 


95% CI 


50-55 


AA 


1.00 






AC 


3.13 


0.95-10.33 




CC 


7.86 


1.78-34-64 


56-60 


AA 


1.00 






AC 


1.33 


0.60 -Z95 




CC 


1.95 


0.63-6.06 


61-70 


AA 


1.00 






AC 


1.81 


1.07-3.07 




CC 


2.54 


1.16-5.56 
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Example 13 

The data of example 9 concerning the high -risk haplotype were stratified according 
to age and gender and adjusted for smoking. The results are shown In table 18. It is 
obvious that most af the effect of the high-risk haplotype on risk of lung cancer is 
5 exerted on the young women, while the effect on men at best is very moderate. 

Table 18. Sex and age group specific estimates of the lung cancer rate ratios (RR) 

in association with the high-risk haplotype, adjusted for duration of smoking. 

Age Homozygous Female ^ Male 

group for haplotype 8 RR(96% CI) p RR (95 % CI) p 



50-55 

No 1.0 1.0 0.75 

Yes 7.02(1.88-26.18) 0.004 0.80 (0.20-3.18) 

56-60 No 1.0 1.0 0.37 

Yes 1.03(0.29-3.71) 0.97 -0.69(0.30-1.58) 

61-70 No 1.0 0.76 1.0 0.94 

Yes 0.89 (0.40-0.76) 1 .03 (0.48-2.22) 



10 a) Homozygous carriers of high-risk haplotype are defined as ERCC1 exon4 GG > ASE-. 
1 exonl GO , ita/intron 
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Table 16. Relative risks and 95 percent confidence intervals and p»values for 
trend for lung cancer in different age groups as a reflection of alleles at the 4 
bp deletion site, adjusted forXPDe23 end the high-risk haplotype. 



Age at diagnosis (Yr) 


Allele 


RR 


95% CI 


Pftmnd) 


50-55 


163/163 


1.00 




0.31 




163/167 


1.35 


O.36-5.02 






167/167 


0.35 


0.11-2.87 




56-60 


163/163 


1.00 








163/167 


1.76 


0.58-5.38 


0.75 




167/167 


1.04 


0.26-4.14 




61-70 


163/163 


1.00 




0.02 




163/167 


0.67 


0.36-122 






167/167 


0.36 


0,16-0.82 
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Table 17- Relative risks and 95 percent confidence Intervals for lung cancer in 
Different age groups as a reflection af alleles at the XPDc23 site, adjusted for 
the high-risk haplotype and the 4 bp deletion. 



Age at diagnosis (Yr) 


Allele 


RR 


95% CI 


60-55 


AA 


1.00 






AC 


3.13 


0.95-10.33 




CC 


7.86 


1.78-34-64 


56-60 


AA 


1.00 






AC 


1.33 


0.60-2.95 




CC 


1.95 


0.63-6.06 


61-70 


AA 


1.00 






AC 


1.81 


1.07-3.07 




CC 


2.54 


1.16-5.56 
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Example 13 

The data of example 9 concerning the high -risk haplotype were stratified according 
to age and gender and adjusted for smoking. The results are shown in table 18. It is 
obvious that most af the effect of the high-risk haplotype on risk of lung cancer is 
exerted on the young women, while the effect on men at best is very moderate. 
Table 18. Sex and age group specific estimates of the lung cancer rate ratios (RR) 

in association with the high-risk haplotype, adjusted for duration of smoking. 
Age Homozygous Female ***" Male 



group for haplotype 9 RR(95%CI) p RR(95%CI) p 
50-55 — " " 

No 1.0 1.0 0.75 

Yes 7.02(1.88-26.18) 0.004 0.80(0.20-3.18) 

56-80 No 1.0 1.0 0.37 

Yes 1.03(0.29-3.71) 0.97 .0.69(0.30-1.58) 

61-70 NO 1.0 0.76 1.0 0 94 

Yes 0.89(0.40-0.76) 1.03(0.48-2.22) 



10 a) Homozygous carriers of high-risk haplotype are defined as ERCC1 exon4 00 , ASE- 
/exonl GO ,iM/iiitron 
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Claims: 



1 . A method for estimating the cancer risk of an individual comprising 

- providing a sample from said individual, 

- assessing in the genetic material in said sample a sequence polymorphism 

- in a region corresponding to SEQ ID NO: 1, or a part thereof, or 

- in a region complementary to SEQ ID NO: 1 , or a part thereof, or 

- in a transcription product from a sequence In a region corresponding to SEQ 
ID NO: 1 , or a part thereof, or 

- or translation product from a sequence In a region corresponding to SEQ ID 
NO: 1 , or a part thereof, 

- obtaining a sequence polymorphism response, 

- estimating the cancer risk of said individual based on the sequence polymor- 
phism response. 

2. The method according to claim 1, wherein the cell sample is a blood sample, a 
tissue sample, a sample of secretion, semen, ovum, a washing of a body sur- 
face, such as a buccal swap, a clipping of a body surface, including hairs and 
nails. 

3. The method according to any of the preceding claims, wherein the ceil is se- 
lected from white blood ceils and tumor tissue. 

4* The method according to any of the preceding claims, wherein the sequence 
polymorphism comprises at least one mutation base change. 



5. The method according to any of the preceding claims, wherein the sequence 
polymorphism comprises at least two base changes. 
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6. 



The method according to 
polymorphism comprises 



any of the preceding claims, wherein the sequence 
at least one single nucleotide polymorphism. 



7. 



The method according to 
polymorphism comprises 



any of the preceding claims, wherein the sequence 
at least two single nucleotide polymorphisms. 



8. The method according to any of the preceding claims, wherein the sequence 
polymorphism comprises at least one tandem repeat polymorphism. 

10 9. The method according to any of the preceding claims, wherein the sequence 
polymorphism comprises at least two tandem repeat polymorphisms. 

10. The method according to any of the preceding claims, wherein the cancer is se- 
lected from skin carcinoma including malignant melanoma, breast cancer, lung 

15 cancer, colon cancer and other cancers in the gastrointestinal tract, prostate 

cancer, lymphoma, leukemia, pancreas cancer, head and neck cancer, ovary 
cancer and other gynecological cancers. 

1 1 . The method according to any of the preceding claims, wherein the cancer is se- 
20 lected from skin cancer, lung cancer, colon cancer and breast cancer. 

12. The method according to any of the preceding claims, wherein the cancer is se- 
lected from skin cancer and breast cancer. 

25 13. The method according to any of the preceding claims 10-12, wherein the skin 



Is conducted by means of at least one nucleic acid primer or probe, such as a 
30 primer or probe of DNA, RNA or a nucleic acid analogue such as peptide nucleic 

acid (PNA) or locked nucleic acid (LNA). 

« 

15. The method according to claim 14, wherein the nucleotide primer or probe is 
capable of hybridising to a subsequence of the region corresponding to SEQ ID 
35 NO: 1, or a part thereof, or a region complementary to SEQ ID NO:1- 




cancer is basal cell carcinoma. 



14. The method according to any of the preceding claims, wherein the assessment 




» 
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16. The method according to claim 14, wherein the primer or probe has a length of 
at least 9 nucleotide or peptide monomers. 

5 17. The method according to any of the preceding claims 14-16, wherein at least 
one primer or probe is capable of hybridising to a subsequence selected from 
the group of subsequences 

I. GCTCTGAAAC TTACTAGCCC(A/G)GTATTTATGG AGAGGCATTT 
10 2. GTGGTCAAATTCTCATTCAT CGTGG (TIC) CCAG6CAAGC 

ACACTTCCTC 

3. ACCCTGAGGT GAGCACCTGT TCCTT(C/T) TCCTTGCCCT TAGCCCA- 
GAG GTAGA 

4. GGGCAGGGGT TTGTGCCTCC AATGA (G/A) CACAAGCTCC 
15 CCCTGCCCCC CAACT 

5. CCTGGCGGTG GCCGTCACCA GCTTT (T/C) GGGGGTGTTT 
GGGAAGCTGG 

6. CTCCAGCCCC ACTGTTCCCT (A/G) GGCCCTATTG GTCCCCCTGG 

7. ACAAGGAGGA GGCAGAAGTG AGGTT (G/C) AAACCCACTG CCCAATC- 
20 TTA 

8. CCAACACGGT GAAACCCCGT CTGTA(T/C)TAAAAATACA AAAATTAGCC 

9. AATCCAGGAC CCCATAATCT TCCGT (C/T) ATCTAAAACA ATA- 
ATGGTGA 

10. CCCAAGGGGG CGAGGGGAGG GTGAA (A/G)GGGTGGGACG 
25 GGGGCAGCCG 

II. GAAGTGAGAA GGGGGCTGGG GGTCG (G/-) CGCTCGCTAG 
CGGGCGCGGG 

12. CGCACGCGCA GTATCCCGAT TGGCT (C/G)TGCCCTAGCG GATT- 
GACGGG 

30 1 3. AACTCCTGGG TTCGATCAAT ACTCA (GACA/-) ATCTTGGCAG 

GCGCAGGAGG 

14. GCTGGGATTA CAGGCTTGAG CCACC (A/G) CGCCCGGCCT 
GCAAAGCCAT 

15. TTTTGTATCT TTAGTAGAGA CAGG (T/G) TTTCTCCATG TTGGTCAGGC 
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16. GCCTCAGCCT CCCGAGTAGC TGAGACT (C/A) CAGGTGCCCG CCAC- 
CACGCC 

17. TGAAATTGTA GGTTGAGAGG CCAGGCG (C/T) GGTGCTCACG 
CCTGTAATTT 

18. GTTTATAAAC ATTAAACCAG (T/A) GCTGTGTGAA GGCACTTAAT 

19. CCGTCTCTAT TAAAAATATA AAA (A/C) AATTTAGCCG GGTGTAGCGG 

20. GGGAGGCTCG AGGCGGQC (A/G) GATTGCATGA GCTCAGGATT 

21. TCCCAAGTTT CAGGGCCCAA (T/G) ATTCTCAAAT CACAGGATTC 

22. TGCAGTGAGC TGAGATCGC (A/G) CCACTGCACT CCAGCCTGGG 

23. TCTTAGGACG CATGGGGGT (T/G) GAGAGAACGG GGAGATAGAC 

24. CTGGGTTCTA GAACTACC (C/T) ATGCAAACCC AGCTGTTTCC 

25. ATTCT6CCCT GGGTTCTAGA ACTACCT (C/A) TGCAAACCCA 
GCTGTTTCCC 

26. GCTGTTTCCC ACCCCATAAG GCA (A/G) TAGGGGAGCC 
CACCTCCGCC 

27. GACCTAGAAG ATCGGTCGAG A (C/T) AGCAGCTTGA GGCTGGCAGG 

28. CTGGCCAGGA ATGCAGTCGG GTCAC (C/T) CTGTCTAGCC 
ACCGTCTCGC 

29. GGGAGGAGTC GCCGATCAGG (C/T) CCCTTCCTGA AAGTCATCGA 

30. GCAGCCCGGG CTACAGGGTT (A/G) CCTGAGGTGT GGGTCCCAGG 

31. TAGAAATACT AACAAAGGGC (T/C) GTGGGTTTCT CCCCCTGCTT 

32. ACAGGAGAGG GAAGG! I ! I I IG (A/T) I I 1 1 1 1 I 1 1 I GTTTTTTTTT 

33. GAAGAGGAAG AAGCCCAAAG GGA (A/C) AGAAACCTTC GAGCCA- 
GAAG 

34. GCGCCTCAAC AGCCAGAAGG AGCG (A/G) AGCCTCAGGC CCAGG- 
CAGCT 

35. TTGAGACTCT CTGTTTGAT (A/G) CTTCACTCAG AAGGTGCTTC 

36. AGGCCAGGCT CCTGCTGGCT G (C/G) GCTGGTGCAG TCTCTGGGGA 

37. CCCCTATACC CTCAAGCAT (C/T) TATCCATTGA GTTACAAACA 

38. ACCATCCCCC GCCTTCCGTT (A/C) GTCCGGCCCC CGAGGCTAGC 

or to a sequence complementary to any of the subsequences. 



18. The method according to claim 17. wherein at least one nucleotide probe is se- 
lected from the group consisting of 
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1 . TGAAATTGTA GGTTGAGAGG CCAGQCG (CAT) GGTGCTCACG 
CCTGTAATTT 

2. GTTTATAAAC ATTAAACCAG (T/A) GCTGTGTGAA GGCACTTAAT 
5 3. CCGTCTCTAT TAAAAATATA AAA (A/C) AATTTAGCCG GGTGTAGCGG 

4. GGGAGGCTCG AGGCGGGC (A/G) GATTGCATGA GCTCAGGATT 

5. TCCCAAGTTT CAGGGCCCAA (T/G) ATTCTCAAAT CACAGGATTC 

6. TGCAGTGAGC TGAGATCGG (A/G) CCACTGCACT CCAGCCTGGG 

7. TCTTAGGACG CATGGGGGT (T/G) GAGAGAACGG GGAGATAGAC 
10 . 8. CTGGGTTCTA GAACTACC (C/T) ATGCAAACCC AGCTGTTTCC 

9. ATTCTGCCCT GGGTTCTAGA ACTACCT (C/A) TGCAAACCCA 
GCTGTTTCCC 

10. GCTGTTTCCC ACCCCATAAG GCA (A/G) TAGGGGAGCC 
CACCTCCGCC 

15 11. GACCTAGAAG ATCGGTCGAG A (C/T) AGCAGCTTGA GGCTGGCAGG 

12. CTGGCCAGGA ATGCAGTCGG GTCAC (C/T) CTGTCTAGCC 
ACCGTCTCGC 

13. GGGAGGAGTC GCCGATCAGG (C/T) CCCTTCCTGA AAGTCATCGA 

14. GCAGCCCGGG CTACAGGGTT (A/G) CCTGAGGTGT GGGTCCCAGG 
20 15. TAGAAATACT AACAAAGGGC (T/C) GTGGGTTTCT CCCCCT6CTT 

16. ACAGGAGAGG GAAGGTTTTTTG (A/T) I I II I 1 1 1 I I Gl II I I I I I I 

17. GAAGAGGAAG AAGCCCAAAG GGA (A/C) AGAAACCTTC GAGCCA- 
GAAG 

18. GCGCCTCAAC AGCCAGAAGG AGCG (A/G) AGCCTCAGGC CCAGG- 
25 CAGCT 

or to a sequence complementary to any of the subsequences. 

19. The method according to claim 18, wherein at least one nucleotide probe is se- 
30 lected from the group consisting of 

1 . GTTTATAAAC ATTAAACCAG (T/A) GCTGTGTGAA GGCACTTAAT 

2. CCGTCTCTAT TAAAAATATA AAA (A/C) AATTTAGCCG GGTGTAGCGG 

3. GGGAGGCTCG AGGCGGGC (A/G) GATTGCATGA GCTCAGGATT 
35 4. TCCCAAGTTT CAGGGCCCAA (T/G) ATTCTCAAAT CACAGGATTC 



HO I BERG A/S -> PVS @095 

92 

5. TGCAGTGAGC TGAGATCGC (A/G) CCACTGCACT CCAGCCTGGG 
or to a sequence complementary to any of the subsequences. 

20. The method according to any of the preceding claims, wherein at least one se~ 
5 quence polymorphism is assessed in a region corresponding to SEQ ID NO: 1 

position 1521-37752 ft). 

21 . The method according to any of the preceding claims, wherein at least one se- 
quence polymorphism Is assessed in a region corresponding to SEQ ID NO: 1 

10 position 7760-22885 (RAI). 

22. The method according to any of the preceding claims, wherein at least one se- 
quence polymorphism la assessed in a region corresponding to SEQ ID NO: 1 
position 34391-37752. 

23. The method according to any of the preceding claims, wherein at least two diffe- 
rent probes are used, one probe being selected from the probes as defined in 
any of claims 17-21. and the other probe being capable of hybridising to a se- 
quence different from SEQ ID NO: 1 , or a part thereof, or to a sequence com- 
plementary to a region different from SEQ ID NO: 1, or a part thereof,* 

24. The method according to claim 1. wherein the translationa! product from a se- 
quence In a region corresponding to SEQ ID NO: 1 f or a part thereof, is an anti- 
body, such as a monoclonal or polyclonal antibody. 

25. A method for estimating the cancer prognosis of an individual comprising 
- providing a sample from said individual, 

30 - assessing in the genetic material in said sample a sequence polymorphism 

- In a region corresponding to SEQ ID NO: 1, or a part thereof, or 

- in a region complementary to SEQ ID NO: 1 r or a part thereof, or 

in a transcription product from a sequence in a region corresponding to 
35 SEQ ID NO: 1. or a part thereof, or 
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- or translation product from a sequence in a region corresponding to SEQ 

ID NO: 1 9 or a part thereof, 
• obtaining a sequence polymorphism response, 

5 - estimating the cancer prognosis of said individual based on the sequence 

polymorphism response. 

26. The method according to claim 25. wherein the method has any of the features 
as defined in any of the claims 2-24. 

27. A method for estimating a treatment response of an individual suffering from 
cancer to a cancer treatment, comprising 

- providing a sample from said individual, i 

- .assessing In the genetic material in said sample a sequence polymorphism 



10 



15 



- in a region corresponding to SEQ JD NO: 1 , or a part thereof, or 

- in a region complementary to SEQ.ID NO: 1 . or a part thereof, or 

20 • in a transcription product from a sequence in a region corresponding to 

SEQ ID NO: 1, or a part thereof, or 

- or translation product from a sequence in a region corresponding to SEQ 
ID NO: 1 , or a part thereof, 

- obtaining a sequence polymorphism response. 



25 



estimating the individual's response to the cancer treatment based on the 
sequence polymorphism response. 



28. The method according to claim 27, wherein the method has any of the features 
30 as defined in any of the claims 2-24. 

29. A primer or probe for use In a method as defined in any of the claims above, 
said primer or probe being selected from 



35 T6GCTAACACGGTGAAACC(SEQ ID NO:7) 
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GGAATCCAAAGATTCTATGATGG(SEQIDNO:8) 
GGGAGGCGGAGCTTGCAGTGA (SEQ ID NO:9) 
CTGAGATCGCACCACTGCAC (SEQ ID NO:10) 
GGTTTTCTGCTCTGCACACG (SEQ ID NO:11) 
5 CCTTTCTCCTTCCACCAACG (SEQ ID NO:12) 

CGGGCTACAGGGTTACCTGAG (SEQ ID NO:13) 
TCTGCAACCTGGTGCGAGCAGC (SEQ ID N0:14) 
CCTACCACCATCATCACATCC (SEQ ID NO:15) 
GCCTTGCCAAAAATCATAACC (SEQ ID NO:16) 
10 CCTCTCCCCAATTAAGTGCCTTCACACAGC (SEQ ID NO:17) 

AGCCAGGGAGGTTGAG6CT (SEQ ID NO:18) 
• AGACAGCCCTGAATCAGCAC (SEQ ID NO:1 9) 

GCAATGAGCCGAGATAGAA (SEQ ID NO:20) 
TG6CTAGCCCATTACTCTA (SEQ ID NO:21) 



15 



30. A primer or probe for use in a method as defined in any of the claims above as 
the other probe 



GCCCCGTCCCAGGTA (SEQ ID NO:21 ) 

20 AGCCCCAAGACCCTTTCACT (SEQ ID NO:22) 

GTCCCATAGATAGGAGTGAAAG (SEQ ID NO:23) 
CCCTAGGACACAGGAGCACA (SEQ ID NO:24) 
TTGtGCTTTCTCTGTGTCCA (SEQ ID NO:25) 
TATCAGAAAAGGCTGGAGGA (SEQ ID NO:26) 

25 GAGTGGCTGGGGAGTAGGA (SEQ ID NO:27) 

GCCAAGCAGAAGAGACAAA (SEQ ID NO:28) 
CCTCAGATGTCCTCTGCTCA (SEQ ID NO:29) 
GCCACAGCCCCAGCAAGTAG (SEQ ID NO:30) 
AGGACCACAGGACACGCAGA (SEQ ID NO:31) 

30 CATAGAACAGTCCAGAACAC (SEQ ID NO:32) 

TTAGCTTGGCACGGCTGTCCAAGGA (SEQ ID NO:33) 
ACAGAATTCGCCCCGGCCTGGTACAC (SEQ ID NO:34) 
TTGAAACTGGAACTCTGAGAAGG (SEQ ID NO:35) 
TGGTGGATGGTGTGAAGCA (SEQ ID NO:36) 

35 CCTTTCTCCAACTTCTTCTCCATrTCCACC(SEQIDNO:37) 
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GGGGATCATGTCGTCAATGGACT (SEQ ID NO:38) 
ATGCCCTGTAGGTTCAATGG (SEQ ID NO:39) 
TGGAGGTCTTTA6GGGCTTG (SEQ ID NO:40) 
GGCTGGTCCCCGTCTTCTCCTTCC (SEQ ID NO:41) 
TCTCTGTTGCCACTTCAGCCTC (SEQ ID NO:42) 
GTCCTGCCCTCAGCAAAGAGAA (SEQ ID NO:43) 
TTCTCCTGCGATTAAAGGCTGT (SEQ ID NO:44) 
ATCCTGTCCCTACTGGCCATTC (SEQ ID NO:45) 
TGTGGACGTGACAGTGAGAAAT (SEQ ID NO:46) 
TGGAGTGCTATGGCACGATCTCT (SEQ ID NO:47) 
CCATGGGCATCAAATTCCTGGGA (SEQ ID NO:48) 
CACACCTGGCTC AI I 1 1 IG TAT (SEQ ID NO:49) 
.TCATCCAGGTTGTAGATGCCA (SEQ ID NO:50) 
AGGCTCAACAAGGAAAAATGC (SEQ ID NO:51 ) 
GCTAGACAGTCAAGGAGGGACG (SEQ ID NO:52) 
AAAGGGTGGGTGTGGGAGACATTGG (SEQ ID NO:53) 
AAACCAACCTAGGCACCCCAAA (SEQ ID NO:54) 
CAGTGTCCAAAGAGCACC (SEQ ID NO:55) 
CTACCCCTTTAGCGACC (SEQ ID NO:56) 
TCCTGCCCCCAGAGCGTCACC (SEQ ID IMO:57) 
GTACGGTCCACATAATTTTGGAGGA (SEQ ID NO:58) 
CGACGAACTTCTCTGAAGCGAA (SEQ ID NO:59) 
AGCGACACGGGCATCTGG (SEQ ID IMO:60) 
ATGAGCGTCCACCTCCTGAACC (SEQ ID NO:61} 
AGGCAGCAGCATCGTCATCCCC (SEQ ID NO:62) 
TGCATAGCTAGGT CCTGC (SEQ ID NO:63) 

AACTGACRAAACTAGCTCTATGGGGTGGTGCCGCA (SEQ ID NO:64) 
CTGGCTCTGAAACTTACTAGCCC (SEQ ID NO:65) 
GCTGGACTGTCACCGCATG (SEQ ID NO:66) 
GGAGCAGGGTTGGCGTG (SEQ ID NO:67) 
TGCCCTCCCAGAGGTAAGGCCT (SEQ ID NO:68) 
CCCTQCCGGAGGTAAGGCCTC (SEQ ID NO:69) 
GATCAAAGAGACAGACGAGC (SEQ ID NO:70) 
GAAGCCCAGGAAATGC (SEQ ID NO:71) 
GGACGCCCACCTGGCCAACC (SEQ ID IMO:72) 
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CGTGCTGCCCAACGAAGTG (SEQ ID NO:73) 

31. The primer or probe according to any of claims 28 or 29. wherein the probe is 
operably linked to at least one label, such as operabiy linked to two different la- 



32. The probe according to claim 30, wherein the label is selected from TEX, TET, 
TAM, ROX, R6G, ORG, HEX, FLU, FAM, DABSYL. Cy7. Cy5. Cy3, BOFL, BOF, 
BO-X, BO-TRX. BO-TMR, JOE, 6JOE, VIC ( 6FAM. LCRed640, LCRed705, 
TAMRA, Biotin, Digoxigenin, DuO-family, Daq-famlly. 

* 

33. The primer or probe according to any of claims 28-31, wherein the primer or 
probe Is operably linked to a surface. 

34. The primer or probe according to claim 32, wherein thfe surface is the surface of 
microbeads or a DNA chip. 

35. An antibody directed to an epitope of a RAI gene product. 



bels. 



36. A kit for use in a method as defined in any of the claims above, comprising at 
least one primer or probe, said probe being as defined in any of claims 29-35. 
and optionally further amplifying means for nucleic acid amplification. 
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XPD-translation start*- •* Start sequence SI 

CCCTTCTCACTTCAT GGCGCCGGCCGGACTGTGCAGCGOGGTCOACCCGCCTCCCTCA 

* XPD-TATA BOX 

TGAATATTCAGCGAGAGGCCGGGTCGTGGACATCCTCGAGGGCTCGCTCCACCZ42^r 
*- -» Start sequence S2 

TA cgagaccattggctaacctgcccgtcaatccgctagggcagagccaatcgggatac 

■ 

TCOjCGTGCGCACGGAAAAGCGACKKJCGGCTGACTCTCGGGTOAGGCGGTGCGGGAG 

~> Start sequence S3 

GCGTCACTGAGGATCGTCGAGGGCCAATCAAAAAGAAAACATGGAAGGGAAAGAGCC 

• 4bp deletion 

GAGAGACTCGATCTCATTCACTAGAATITGGTCCTCCTGCGCCTGCCAAGATTXrrCTGA 

_ End of S sequences*- -» Start of region r 

GTATTGATCGAACCCAGGAGTTCGAGATCAGCTTGAGCAAGATAGCG 

A 

51 is the sequence from * to A 

52 is the sequence from ■ to A 

53 is the sequence from • to A 
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